ANTLR4 Grammar

Fileseq v3 uses an ANTLR4 grammar for parsing file sequences. This grammar is shared across Python, Go, and C++ implementations to ensure consistent behavior.

Grammar File

The complete grammar is defined in src/fileseq/grammar/fileseq.g4:

  1grammar fileseq;
  2
  3// ============================================================================
  4// Parser Rules
  5// ============================================================================
  6
  7input
  8    : sequence EOF
  9    | patternOnly EOF
 10    | singleFrame EOF
 11    | plainFile EOF
 12    ;
 13
 14// Sequence with padding: /path/file.1-100#.exr or /path/file1-100#.exr
 15// Also handles hidden files: /path/.hidden5.1-10#.7zip
 16// Basename and extension are optional (allows patterns like /path/1-100#.ext)
 17// Python-specific: Supports optional subframe patterns:
 18//   - Dual range: /path/file.1-5#.10-20@@.exr (frameRange with dot prefix + padding)
 19//   - Composite padding: /path/file.1-5@.#.exr (dot + padding only)
 20//   Go/C++ ignore the second pair until subframe support is implemented
 21sequence
 22    : directory basename? frameRange padding (frameRange padding | SPECIAL_CHAR padding)? extension*
 23    ;
 24
 25// Pattern-only sequence (padding without frame range): /path/file.@@.ext
 26// Basename and extension are optional (allows patterns like /path/@@@.ext)
 27// Python-specific: Supports optional subframe padding: /path/file.#.#.ext
 28//   Go/C++ ignore the second padding until subframe support is implemented
 29patternOnly
 30    : directory basename? padding (SPECIAL_CHAR padding)? extension*
 31    ;
 32
 33// Single frame: /path/file.100.exr (extension required after frame number)
 34// Also handles hidden files: /path/.hidden.100.ext
 35// Python semantic rule: a dot-number is only treated as a frame if there's an extension after it
 36// Basename is optional to handle cases like .10000000000.123 where both are DOT_NUM tokens
 37singleFrame
 38    : directory basename? frameNum extension+
 39    ;
 40
 41// Plain file: /path/file.txt or /path/file or /path/.hidden (no frame pattern)
 42plainFile
 43    : directory plainBasename? extension*
 44    ;
 45
 46// Directory: optional leading slash(es) + segments ending with slash
 47// SLASH SLASH? allows one or two leading slashes to support UNC paths (//server/share/)
 48directory
 49    : (SLASH SLASH?)? (dirSegment SLASH)*
 50    ;
 51
 52// Any token valid in a basename or directory segment.
 53// Adding a new lexer token that belongs in basenames only requires updating this rule.
 54basenameChar
 55    : WORD | NUM | DOT_NUM | DASH | SPECIAL_CHAR | EXTENSION | FRAME_RANGE | DOT_FRAME_RANGE | WS | OTHER_CHAR
 56    ;
 57
 58// Reduced set for plain-file basenames: excludes EXTENSION and DOT_NUM so they
 59// are left for the extension rule to consume.
 60// HASH and AT are included so that filenames containing literal '#' or '@' characters
 61// (e.g. "helloMyPhone#Is911.json") are accepted as plain files rather than rejected.
 62// The parser still prefers sequence/patternOnly alternatives (which appear earlier in the
 63// input rule) when '#' or '@' appears in a valid padding position.
 64plainBasenameChar
 65    : WORD | NUM | DASH | SPECIAL_CHAR | FRAME_RANGE | DOT_FRAME_RANGE | WS | OTHER_CHAR | HASH | AT
 66    ;
 67
 68// Directory segments can contain anything including frame-range-like patterns
 69// Includes EXTENSION to handle dots in directory names (e.g. path.with.dots)
 70// Includes WS to preserve whitespace in directory names
 71// Includes OTHER_CHAR for special characters like ! $ % ( ) etc.
 72dirSegment
 73    : basenameChar+
 74    ;
 75
 76// Basename for sequence, patternOnly, and singleFrame rules.
 77// Includes EXTENSION (for hidden files like .hidden) and DOT_NUM.
 78// Also includes FRAME_RANGE tokens for date-like patterns (e.g., "name_2025-05-13_")
 79basename
 80    : basenameChar+
 81    ;
 82
 83// Basename for plain files: does NOT include EXTENSION or DOT_NUM
 84// (so both regular and digit-only extensions can be consumed by extension rule)
 85// But DOES include FRAME_RANGE tokens (for filenames like "name_2025-05-13.ext")
 86plainBasename
 87    : plainBasenameChar+
 88    ;
 89
 90// Frame range: may or may not have leading dot
 91// Also includes single frame numbers (for single-frame sequences with padding)
 92frameRange
 93    : DOT_FRAME_RANGE
 94    | FRAME_RANGE
 95    | DOT_NUM      // Single frame with dot: .100
 96    | NUM          // Single frame without dot: 100
 97    ;
 98
 99// Single frame number with leading dot: .100 or .-10
100frameNum
101    : DOT_NUM
102    ;
103
104// Padding may use mixed characters (e.g. ###@ = 13 chars with HASH4 style)
105// Each language's PaddingCharsSize handles per-character width calculation
106padding
107    : UDIM_ANGLE
108    | UDIM_PAREN
109    | PRINTF_PAD
110    | HOUDINI_PAD
111    | (HASH | AT)+
112    ;
113
114// Extension can be:
115// - EXTENSION tokens (.tar, .gz, .exr)
116// - DOT_NUM for digit-only extensions (.123, .10000000000)
117// - WORD for non-dot extensions after padding (_exr, _extra)
118// - Followed by optional DASH and NUM (for extensions like .tar.gz-1)
119extension
120    : EXTENSION (DASH NUM)?
121    | DOT_NUM
122    | WORD
123    ;
124
125// ============================================================================
126// Lexer Rules - ORDER MATTERS FOR PRIORITY
127// ============================================================================
128
129// Padding markers - HIGHEST PRIORITY
130// Note: These must come before OTHER_CHAR to match padding first
131UDIM_ANGLE: '<UDIM>';
132UDIM_PAREN: '%(UDIM)d';
133PRINTF_PAD: '%' [0-9]* 'd';
134HOUDINI_PAD: '$F' [0-9]*;
135HASH: '#';
136AT: '@';
137
138// Extension: dot + pattern containing at least one letter
139EXTENSION: '.' ([a-zA-Z_] | [0-9]* [a-zA-Z] [a-zA-Z0-9_]*);
140
141// Frame range with leading dot (must have comma, colon, or dash after first number)
142// Matches: .1-100, .-10-100, .1,2,3, .1-10x2, .1,2,3,5-10,20-30
143// Optional decimal suffix for decimal step values: .1-5x0.25
144DOT_FRAME_RANGE: '.' '-'? [0-9]+ [,:-] [0-9xy:,-]* ('.' [0-9]+)?;
145
146// Frame range without leading dot (must have comma, colon, or dash after first number)
147// Matches: 1-100, -10-100, 1,2,3, 1-10x2
148// Optional decimal suffix for decimal step values: 1-5x0.25
149FRAME_RANGE: '-'? [0-9]+ [,:-] [0-9xy:,-]* ('.' [0-9]+)?;
150
151// Frame number with dot: .100 or .-10 (single frame, no range delimiter)
152DOT_NUM: '.' '-'? [0-9]+;
153
154// Slash separator
155SLASH: '/' | '\\';
156
157// Special characters commonly used in basenames
158SPECIAL_CHAR: [:,.];
159
160// Number sequence (for basenames containing numbers)
161NUM: [0-9]+;
162
163// Words (letters and underscores, no digits or dashes)
164WORD: [a-zA-Z_]+;
165
166// Dash as separate token
167DASH: '-';
168
169// Whitespace as token (don't skip - it's part of filenames)
170WS: [ \t\r\n]+;
171
172// Other valid filename characters (catch-all for POSIX/Windows filenames)
173// Excludes: / \ (path separators), whitespace, and core tokens
174// Includes: ! $ % & ' ( ) + ; = [ ] { } ~ and other printable ASCII
175// Note: $ and % may conflict with padding tokens (HOUDINI_PAD, PRINTF_PAD) in edge cases
176// This is acceptable - such patterns are rare in VFX workflows
177OTHER_CHAR: ~[/\\\r\n\t .,:a-zA-Z0-9_#@<>-]+;

Regenerating the Parser

If you modify the grammar, regenerate the Python parser:

# Using hatch
hatch run generate

# Or directly with Java
java -jar tools/antlr-4.13.1-complete.jar \
    -Dlanguage=Python3 \
    -visitor \
    -o src/fileseq/parser \
    src/fileseq/grammar/fileseq.g4
Requirements:
  • Java 11+ in PATH

  • ANTLR 4.13.1 JAR (included in tools/)

Grammar Rules

The grammar defines four main patterns:

sequence

Full sequence with frame range and padding: /path/file.1-100#.exr

patternOnly

Padding without explicit frame range: /path/file.#.exr

singleFrame

Single frame file: /path/file.0100.exr

plainFile

No frame pattern: /path/file.txt

Python-Specific Features

The Python implementation supports additional subframe notation:

  • Dual range: file.1-5#.10-20@@.exr (main frames + subframes)

  • Composite padding: file.1-5@.#.exr (frame + subframe padding)

  • Pattern only: file.#.#.exr (wildcard for both components)

These patterns are parsed by the grammar but ignored by Go/C++ implementations until subframe support is added.