Architecture
This document describes the internal architecture of the Djot PHP parser.
Overview
The library follows a classic parser/renderer architecture:
Input (Djot String)
↓
BlockParser
↓
AST (Document)
↓
HtmlRenderer
↓
Output (HTML String)Components
BlockParser
The BlockParser is responsible for:
- Splitting input into lines
- Extracting reference definitions (first pass)
- Extracting footnote definitions (first pass)
- Parsing block-level elements (second pass)
- Delegating inline parsing to
InlineParser
Key methods:
parse(string $input): Document- Main entry pointtryParse*()methods - Each returns consumed lines or null
InlineParser
The InlineParser handles inline elements within block content:
- Emphasis, strong, code spans
- Links, images, autolinks
- Smart typography
- Special syntax (math, symbols, footnote refs)
It uses a delimiter stack approach for paired delimiters like _emphasis_.
HtmlRenderer
The HtmlRenderer walks the AST and produces HTML:
- Uses PHP 8
matchexpression for node type dispatch - Supports XHTML mode for self-closing tags
- Handles attribute rendering
AST Structure
Node Hierarchy
Node (abstract)
├── Block\BlockNode (abstract)
│ ├── Document
│ ├── Paragraph
│ ├── Heading
│ ├── CodeBlock
│ ├── BlockQuote
│ ├── ListBlock
│ ├── ListItem
│ ├── Table
│ ├── TableRow
│ ├── TableCell
│ ├── Div
│ ├── ThematicBreak
│ ├── DefinitionList
│ ├── DefinitionTerm
│ ├── DefinitionDescription
│ ├── Footnote
│ ├── RawBlock
│ └── Comment
│
└── Inline\InlineNode (abstract)
├── Text
├── Emphasis
├── Strong
├── Code
├── Link
├── Image
├── HardBreak
├── SoftBreak
├── Span
├── Superscript
├── Subscript
├── Highlight
├── Insert
├── Delete
├── FootnoteRef
├── Math
├── Symbol
└── RawInlineNode Properties
Each node can have:
- Children: Other nodes contained within
- Attributes: Key-value pairs (id, class, custom)
- Type-specific data: e.g., heading level, link URL
Parsing Strategy
Two-Pass Block Parsing
First pass: Extract reference definitions and footnotes
- These can appear anywhere but are needed during inline parsing
Second pass: Parse blocks in order
- Each block type has a
tryParse*()method - Methods return consumed line count or null
- First matching parser wins
- Each block type has a
Block Precedence
Blocks are tried in this order:
- Comments (
{% %}) - Raw blocks (``` =html)
- Code blocks (```)
- Divs (::😃
- Headings (#)
- Thematic breaks (***)
- Block quotes (>)
- Definition lists
- Lists (-, *, 1.)
- Tables (|)
- Footnote definitions
- Reference definitions
- Paragraphs (fallback)
Block Attributes
Block attributes {.class #id key=value} are parsed separately and stored in pendingAttributes. They're applied to the next block element.
Inline Parsing
The inline parser processes text character by character:
- Handle escapes (
\*) - Check for special syntax starts
- Match delimiters for paired elements
- Convert smart typography
- Collect remaining text
Smart Typography
Automatically converts:
- Straight quotes to curly quotes
--to en-dash---to em-dash...to ellipsis
Extension Points
Event System (Recommended)
The preferred way to customize rendering is through the event system. This allows you to modify how nodes are rendered without subclassing:
use Djot\DjotConverter;
use Djot\Event\RenderEvent;
$converter = new DjotConverter();
// Customize link rendering
$converter->on('render.link', function (RenderEvent $event): void {
$link = $event->getNode();
$href = $link->getDestination();
// Add attributes
if (str_starts_with($href, 'http')) {
$link->setAttribute('target', '_blank');
}
// Or completely override the HTML output
// $event->setHtml('<custom-link>...</custom-link>');
});Events are fired for each node type using the pattern render.{node_type}:
render.paragraph,render.heading,render.code_block, etc.render.link,render.image,render.emphasis,render.symbol, etc.
See the Cookbook for common customization recipes.
Adding New Syntax
For entirely new syntax elements, extend the parser:
- Create a new Node class extending
BlockNodeorInlineNode - Add parsing logic in
BlockParserorInlineParser - Add rendering in
HtmlRenderer
Example for a new block type:
// 1. Create node
class Alert extends BlockNode {
public function getType(): string { return 'alert'; }
}
// 2. Add parser method
protected function tryParseAlert(Node $parent, array $lines, int $start): ?int {
// Parse and return consumed lines
}
// 3. Add to parse order in parseBlocks()
// 4. Add renderer
$node instanceof Alert => $this->renderAlert($node),