Understanding ASTs

Learn about abstract syntax trees (ASTs) and how they are used in codemods. This page covers the basics of ASTs, including what they are and how they are used to represent the structure of code. We'll also discuss how to read and manipulate ASTs in your codemods to automatically refactor your codebase.


Before writing your first codemod, it’s important to first have a good conceptual understanding of ASTs (abstract syntax trees) and how to work with them.

Abstract Syntax Tree (aka AST)

An abstract syntax tree (AST), is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

Wiki: Abstract syntax tree

Abstract Syntax Trees can be thought of as simply the object representation of your code after being parsed. You might already be familiar with the tools and libraries that do this, for example babeljs, recast, eslint etc. These tools parse files into ASTs in preparation for some work, which generally consists of breaking a raw file down into "Nodes" which are then organized and categorized into a hierarchial format that can be reasoned about, manipulated and output back into a file.

Code — AST — AST(mutated) — Code

Image source

Not all ASTs are the same, different libraries like babel and recast structure their ASTs differently, but if you're comfortable with one it's not too hard to wrap your head around another.

For instance, consider this snippet:

console.log('Hello, World');

The way you might categorise the anatomy of this expression in your mind is probably not too dissimilar to how it's actually done in recast (which is what is used in this project). So now compare it to the actual resulting AST, generated by recast.

const AST = {
  type: 'File',
  program: {
    type: 'Program',
    sourceType: 'module',
    body: [
      {
        type: 'ExpressionStatement',
        expression: {
          type: 'CallExpression',
          callee: {
            type: 'MemberExpression',
            object: {
              type: 'Identifier',
              name: 'console'
            },
            computed: false,
            property: {
              type: 'Identifier',
              name: 'log'
            }
          },
          arguments: [
            {
              type: 'StringLiteral',
              extra: {
                rawValue: 'Hello, World',
                raw: "'Hello, World'"
              },
              value: 'Hello, World'
            }
          ]
        }
      }
    ]
  }
};

Now your initial reaction might be "wow that's a lot for a simple console.log" and that's a totally fair assessment. But look a little closer and you should start to see some order among the chaos. Every Node in this tree is given a "type", these types are key to how you navigate and interact with this object.

So for example, if you want to know the arguments this method contains, you could first look at the arguments property on the ExpressionStatement. Arguments in functions can be conceptually thought of as arrays, so it's no surprise that we're presented with an array here. Now we can see that in our array we have one element, which is of the type StringLiteral and has its own metadata attached to it containing the value we're looking for: Hello, World – Hooray 🎉.

Great but where do we go from here? Well it totally depends on what you're trying to achieve. If you want to replace the message that's logged, all you'd have to do is replace the StringLiteral node with one containing the appropriate message. If you want to pass an additional argument to the method, you could simply push a new node into the arguments array and so on.

The point is, with this abstract data structure we can inspect and manipulate it into any shape we want! A perfect tool for us codemoders!

AST Explorer

ASTs, like in the example above, can get quite large, so you could imagine how big one for a typical javascript source file could get 😱! How can one make sense of it all? Well luckily there are indispensable tools out there to help...

astexplorer.net is one of those tools.

AST Explorer screenshot

It provides a real-time representation of your code as an AST which is inspectable and lets you write and test transforms against it live in the browser. It also supports other ASTs like babel, typescript etc. so for our use-case we'll need to configure it a bit to support recast + jscodeshift.

To configure it, follow these steps:

  1. Set language to Javascript
  2. Set the transformer to recast
  3. If you want to enable typescript support, click the little cog icon and set the parser to TypeScript
  4. And finally turn the "transform" toggle on, this should show some new panels to write and test your transforms.

And there you go, you're all setup with a sandbox to help you test/learn/experiment with! From here you should have enough of a grasp of ASTs to try your hand at writing your first codemod.