ANTLR4 Grammar
Fileseq v3 uses an ANTLR4 grammar for parsing file sequences. This grammar is shared across Python, Go, and C++ implementations to ensure consistent behavior.
Grammar File
The complete grammar is defined in src/fileseq/grammar/fileseq.g4:
1grammar fileseq;
2
3// ============================================================================
4// Parser Rules
5// ============================================================================
6
7input
8 : sequence EOF
9 | patternOnly EOF
10 | singleFrame EOF
11 | plainFile EOF
12 ;
13
14// Sequence with padding: /path/file.1-100#.exr or /path/file1-100#.exr
15// Also handles hidden files: /path/.hidden5.1-10#.7zip
16// Basename and extension are optional (allows patterns like /path/1-100#.ext)
17// Python-specific: Supports optional subframe patterns:
18// - Dual range: /path/file.1-5#.10-20@@.exr (frameRange with dot prefix + padding)
19// - Composite padding: /path/file.1-5@.#.exr (dot + padding only)
20// Go/C++ ignore the second pair until subframe support is implemented
21sequence
22 : directory basename? frameRange padding (frameRange padding | SPECIAL_CHAR padding)? extension*
23 ;
24
25// Pattern-only sequence (padding without frame range): /path/file.@@.ext
26// Basename and extension are optional (allows patterns like /path/@@@.ext)
27// Python-specific: Supports optional subframe padding: /path/file.#.#.ext
28// Go/C++ ignore the second padding until subframe support is implemented
29patternOnly
30 : directory basename? padding (SPECIAL_CHAR padding)? extension*
31 ;
32
33// Single frame: /path/file.100.exr (extension required after frame number)
34// Also handles hidden files: /path/.hidden.100.ext
35// Python semantic rule: a dot-number is only treated as a frame if there's an extension after it
36// Basename is optional to handle cases like .10000000000.123 where both are DOT_NUM tokens
37singleFrame
38 : directory basename? frameNum extension+
39 ;
40
41// Plain file: /path/file.txt or /path/file or /path/.hidden (no frame pattern)
42plainFile
43 : directory plainBasename? extension*
44 ;
45
46// Directory: optional leading slash(es) + segments ending with slash
47// SLASH SLASH? allows one or two leading slashes to support UNC paths (//server/share/)
48directory
49 : (SLASH SLASH?)? (dirSegment SLASH)*
50 ;
51
52// Any token valid in a basename or directory segment.
53// Adding a new lexer token that belongs in basenames only requires updating this rule.
54basenameChar
55 : WORD | NUM | DOT_NUM | DASH | SPECIAL_CHAR | EXTENSION | FRAME_RANGE | DOT_FRAME_RANGE | WS | OTHER_CHAR
56 ;
57
58// Reduced set for plain-file basenames: excludes EXTENSION and DOT_NUM so they
59// are left for the extension rule to consume.
60// HASH and AT are included so that filenames containing literal '#' or '@' characters
61// (e.g. "helloMyPhone#Is911.json") are accepted as plain files rather than rejected.
62// The parser still prefers sequence/patternOnly alternatives (which appear earlier in the
63// input rule) when '#' or '@' appears in a valid padding position.
64plainBasenameChar
65 : WORD | NUM | DASH | SPECIAL_CHAR | FRAME_RANGE | DOT_FRAME_RANGE | WS | OTHER_CHAR | HASH | AT
66 ;
67
68// Directory segments can contain anything including frame-range-like patterns
69// Includes EXTENSION to handle dots in directory names (e.g. path.with.dots)
70// Includes WS to preserve whitespace in directory names
71// Includes OTHER_CHAR for special characters like ! $ % ( ) etc.
72dirSegment
73 : basenameChar+
74 ;
75
76// Basename for sequence, patternOnly, and singleFrame rules.
77// Includes EXTENSION (for hidden files like .hidden) and DOT_NUM.
78// Also includes FRAME_RANGE tokens for date-like patterns (e.g., "name_2025-05-13_")
79basename
80 : basenameChar+
81 ;
82
83// Basename for plain files: does NOT include EXTENSION or DOT_NUM
84// (so both regular and digit-only extensions can be consumed by extension rule)
85// But DOES include FRAME_RANGE tokens (for filenames like "name_2025-05-13.ext")
86plainBasename
87 : plainBasenameChar+
88 ;
89
90// Frame range: may or may not have leading dot
91// Also includes single frame numbers (for single-frame sequences with padding)
92frameRange
93 : DOT_FRAME_RANGE
94 | FRAME_RANGE
95 | DOT_NUM // Single frame with dot: .100
96 | NUM // Single frame without dot: 100
97 ;
98
99// Single frame number with leading dot: .100 or .-10
100frameNum
101 : DOT_NUM
102 ;
103
104// Padding may use mixed characters (e.g. ###@ = 13 chars with HASH4 style)
105// Each language's PaddingCharsSize handles per-character width calculation
106padding
107 : UDIM_ANGLE
108 | UDIM_PAREN
109 | PRINTF_PAD
110 | HOUDINI_PAD
111 | (HASH | AT)+
112 ;
113
114// Extension can be:
115// - EXTENSION tokens (.tar, .gz, .exr)
116// - DOT_NUM for digit-only extensions (.123, .10000000000)
117// - WORD for non-dot extensions after padding (_exr, _extra)
118// - Followed by optional DASH and NUM (for extensions like .tar.gz-1)
119extension
120 : EXTENSION (DASH NUM)?
121 | DOT_NUM
122 | WORD
123 ;
124
125// ============================================================================
126// Lexer Rules - ORDER MATTERS FOR PRIORITY
127// ============================================================================
128
129// Padding markers - HIGHEST PRIORITY
130// Note: These must come before OTHER_CHAR to match padding first
131UDIM_ANGLE: '<UDIM>';
132UDIM_PAREN: '%(UDIM)d';
133PRINTF_PAD: '%' [0-9]* 'd';
134HOUDINI_PAD: '$F' [0-9]*;
135HASH: '#';
136AT: '@';
137
138// Extension: dot + pattern containing at least one letter
139EXTENSION: '.' ([a-zA-Z_] | [0-9]* [a-zA-Z] [a-zA-Z0-9_]*);
140
141// Frame range with leading dot (must have comma, colon, or dash after first number)
142// Matches: .1-100, .-10-100, .1,2,3, .1-10x2, .1,2,3,5-10,20-30
143// Optional decimal suffix for decimal step values: .1-5x0.25
144DOT_FRAME_RANGE: '.' '-'? [0-9]+ [,:-] [0-9xy:,-]* ('.' [0-9]+)?;
145
146// Frame range without leading dot (must have comma, colon, or dash after first number)
147// Matches: 1-100, -10-100, 1,2,3, 1-10x2
148// Optional decimal suffix for decimal step values: 1-5x0.25
149FRAME_RANGE: '-'? [0-9]+ [,:-] [0-9xy:,-]* ('.' [0-9]+)?;
150
151// Frame number with dot: .100 or .-10 (single frame, no range delimiter)
152DOT_NUM: '.' '-'? [0-9]+;
153
154// Slash separator
155SLASH: '/' | '\\';
156
157// Special characters commonly used in basenames
158SPECIAL_CHAR: [:,.];
159
160// Number sequence (for basenames containing numbers)
161NUM: [0-9]+;
162
163// Words (letters and underscores, no digits or dashes)
164WORD: [a-zA-Z_]+;
165
166// Dash as separate token
167DASH: '-';
168
169// Whitespace as token (don't skip - it's part of filenames)
170WS: [ \t\r\n]+;
171
172// Other valid filename characters (catch-all for POSIX/Windows filenames)
173// Excludes: / \ (path separators), whitespace, and core tokens
174// Includes: ! $ % & ' ( ) + ; = [ ] { } ~ and other printable ASCII
175// Note: $ and % may conflict with padding tokens (HOUDINI_PAD, PRINTF_PAD) in edge cases
176// This is acceptable - such patterns are rare in VFX workflows
177OTHER_CHAR: ~[/\\\r\n\t .,:a-zA-Z0-9_#@<>-]+;
Regenerating the Parser
If you modify the grammar, regenerate the Python parser:
# Using hatch
hatch run generate
# Or directly with Java
java -jar tools/antlr-4.13.1-complete.jar \
-Dlanguage=Python3 \
-visitor \
-o src/fileseq/parser \
src/fileseq/grammar/fileseq.g4
- Requirements:
Java 11+ in PATH
ANTLR 4.13.1 JAR (included in
tools/)
Grammar Rules
The grammar defines four main patterns:
- sequence
Full sequence with frame range and padding:
/path/file.1-100#.exr- patternOnly
Padding without explicit frame range:
/path/file.#.exr- singleFrame
Single frame file:
/path/file.0100.exr- plainFile
No frame pattern:
/path/file.txt
Python-Specific Features
The Python implementation supports additional subframe notation:
Dual range:
file.1-5#.10-20@@.exr(main frames + subframes)Composite padding:
file.1-5@.#.exr(frame + subframe padding)Pattern only:
file.#.#.exr(wildcard for both components)
These patterns are parsed by the grammar but ignored by Go/C++ implementations until subframe support is added.