How to walk a tree
Contents
Tree traversal
Tree traversal is a common task when working with syntax trees. The term tree here means a node and all its descendants (all the nodes inside it). Traversal means stopping at every node to do something. So, tree traversal means doing something for every node in a tree.
Tree traversal is often also called “walking a tree”, or “visiting a tree”.
To learn more, continue reading, but when working with unist (unified’s trees) you probably need either:
Set up
Glad you’re still here! Let’s say we have the following fragment of HTML, in a file example.html
:
<p>
<!-- A comment. -->
Some <strong>strong importance</strong>, <em>emphasis</em>, and a dash of
<code>code</code>.
</p>
You could parse that with the following code (using unified
and rehype-parse
):
import fs from 'node:fs'
import {unified} from 'unified'
import rehypeParse from 'rehype-parse'
const doc = fs.readFileSync('example.html')
const tree = unified().use(rehypeParse, {fragment: true}).parse(doc)
console.log(tree)
Which would yield (ignoring positional info for brevity):
{
type: 'root',
children: [
{
type: 'element',
tagName: 'p',
properties: {},
children: [
{ type: 'text', value: '\n ' },
{ type: 'comment', value: ' A comment. ' },
{ type: 'text', value: '\n Some ' },
{
type: 'element',
tagName: 'strong',
properties: {},
children: [ { type: 'text', value: 'strong importance' } ]
},
{ type: 'text', value: ', ' },
{
type: 'element',
tagName: 'em',
properties: {},
children: [ { type: 'text', value: 'emphasis' } ]
},
{ type: 'text', value: ', and a dash of\n ' },
{
type: 'element',
tagName: 'code',
properties: {},
children: [ { type: 'text', value: 'code' } ]
},
{ type: 'text', value: '.\n' }
]
},
{ type: 'text', value: '\n' }
],
data: { quirksMode: false }
}
As we are all set up, we can traverse the tree.
Traverse the tree
A useful utility for that is unist-util-visit
, and it works like so:
import {visit} from 'unist-util-visit'
// …
visit(tree, (node) => {
console.log(node.type)
})
root
element
text
comment
text
element
text
text
element
text
text
element
text
text
text
We traversed the entire tree, and for each node, we printed its type
.
Visiting a certain kind of node
To “visit” only a certain type
of node, pass a test to unist-util-visit
like so:
import {visit} from 'unist-util-visit'
// …
visit(tree, 'element', (node) => {
console.log(node.tagName)
})
p
strong
em
code
You can do this yourself as well. The above works the same as:
visit(tree, (node) => {
if (node.type === 'element') {
console.log(node.tagName)
}
})
But the test passed to visit
can be more advanced, such as the following to visit different kinds of nodes.
visit(tree, ['comment', 'text'], (node) => {
console.log([node.value])
})
[ '\n ' ]
[ ' A comment. ' ]
[ '\n Some ' ]
[ 'strong importance' ]
[ ', ' ]
[ 'emphasis' ]
[ ', and a dash of\n ' ]
[ 'code' ]
[ '.\n' ]
[ '\n' ]
Read more about unist-util-visit
in its readme.