AxlFind — Byte-Substring Search Engine
See AxlData — Data Structures for an overview of all data modules.
A Boyer-Moore-Horspool byte-substring search (with case-insensitive and
whole-word variants, forward and backward) that runs over an abstract
random-access byte source — an AxlByteReader. The same engine drives
a flat memory block (the built-in AxlMemReader), a
AxlTextBuffer (gap buffer), and a
AxlPieceTree (out-of-core piece table); the
axl_text_buffer_find / axl_piece_tree_find wrappers build the
appropriate reader and call axl_find_in_source.
The reader pulls overlapping windows so a match straddling the source’s
internal boundaries (a piece boundary, the gap) is never missed; a
contiguous source that supplies the optional peek is scanned in
place with no copy. A successful find reports an AxlMatch (start +
length); length is carried explicitly so the result shape already fits
variable-length matchers.
Header: <axl/axl-find.h>
API Reference
Enums
-
enum AxlFindFlags
Flags for axl_find_in_source and the axl_*_find wrappers.
axl-find.h:
Byte-substring search over an abstract random-access byte source.
The search engine (Boyer-Moore-Horspool, with case-insensitive and whole-word variants) reads through an
AxlByteReader— a tiny function-table over whatever holds the bytes — so the same engine drives a flat memory block, an AxlTextBuffer (gap buffer), and an AxlPieceTree (out-of-core piece table). The reader’sreadpulls windowed chunks (with overlap to catch matches spanning the source’s internal boundaries); an optionalpeeklets a contiguous source be scanned in place with no copy.axl_find_in_sourceis the single engine; the per-typeaxl_*_findwrappers (axl_text_buffer_find, axl_piece_tree_find) build the appropriate reader and call it. A successful find reports anAxlMatch(start + length); length is carried explicitly so the result shape already fits variable-length matchers (a future regex / fuzzy engine slots in behind the same reader without reshaping callers).Byte-oriented; the only “word” notion is for AXL_FIND_WHOLE_WORD, where a word byte is
[A-Za-z0-9_]. Single-threaded (UEFI).Values:
-
enumerator AXL_FIND_DEFAULT
-
enumerator AXL_FIND_CASE_INSENSITIVE
ASCII case fold.
-
enumerator AXL_FIND_BACKWARD
search toward offset 0
-
enumerator AXL_FIND_WHOLE_WORD
match only at word boundaries
-
enumerator AXL_FIND_DEFAULT
Functions
-
void axl_mem_reader_init(AxlMemReader *mem, const void *data, size_t len)
Initialize a contiguous in-memory reader over
data/len.- Parameters:
mem – reader to initialize (caller-owned)
data – borrowed bytes (must outlive the search)
len – number of bytes
-
bool axl_find_in_source(const AxlByteReader *reader, const char *needle, size_t needle_len, size_t from_offset, uint32_t flags, AxlMatch *out)
Search
readerfor theneedle_lenbytes atneedle, scanning fromfrom_offset. Forward (default) returns the lowest match with start >=from_offset;AXL_FIND_BACKWARDreturns the highest match with start <=from_offset.AXL_FIND_CASE_INSENSITIVEfolds ASCII case;AXL_FIND_WHOLE_WORDrequires non-word bytes on both sides. Matches spanning the source’s internal boundaries are handled. Wrap-around is the caller’s job.- Parameters:
reader – byte source to scan
needle – bytes to find
needle_len – length of
needlefrom_offset – where to start scanning
flags – AxlFindFlags
out – [out] match on success
- Returns:
true and fills
outon a match; false if not found (orneedle_lenis 0, or any required argument is NULL).
-
struct AxlMatch
- #include <axl-find.h>
A successful match: the bytes
[start, start + length).lengthequals the needle length for a literal find, but is reported explicitly so the result also fits variable-length matchers.
-
struct AxlByteReader
Random-access byte source the search engine reads through. An implementation fills the function pointers and
ctx; the engine never inspectsctxitself.Public Members
-
size_t (*length)(const AxlByteReader *r)
Total number of bytes the reader can serve.
-
size_t (*read)(const AxlByteReader *r, size_t offset, size_t len, void *buf)
Copy up to
lenbytes starting at logicaloffsetintobuf, returning the number actually copied (fewer thanlenonly whenoffset+ len runs past the end). Always present.
-
const char *(*peek)(const AxlByteReader *r, size_t offset, size_t len)
OPTIONAL zero-copy fast path: if
[offset, offset + len)is stored contiguously, return a direct pointer to those bytes; otherwise return NULL. NULL is always a safe answer — the engine falls back to read. May itself be NULL (no fast path).
-
void *ctx
implementation data (opaque to the engine)
-
size_t (*length)(const AxlByteReader *r)
-
struct AxlMemReader
- #include <axl-find.h>
Built-in
AxlByteReaderover a flat, contiguous memory block. The reader supports the zero-copypeekpath, so searching a memory block performs no copying. Initialize with axl_mem_reader_init and pass&mem.readerto axl_find_in_source. The bytes are borrowed —datamust outlive the search.Public Members
-
AxlByteReader reader
pass &reader to axl_find_in_source
-
const char *data
-
size_t len
-
AxlByteReader reader
AxlRegex — Regular-Expression Matcher
A compiled-pattern regular-expression matcher (<axl/axl-regex.h>)
layered on the same AxlByteReader seam. Compile a pattern once with
axl_regex_new, then search many times (find-all loops, one pattern
across many lines/buffers). It is a Thompson NFA / Pike VM, not a
backtracker, so match time is O(pattern × input) for every pattern —
there is no catastrophic (“ReDoS”) blow-up. Backreferences are
deliberately unsupported because they are not regular and would force
backtracking.
Supported: literals; .; greedy and lazy quantifiers
* + ? *? +? ??; anchors ^ $; alternation |; grouping and
capture ( ); classes [...] / [^...] with ranges; the escapes
\d \w \s \D \W \S and friends. Matching is byte-oriented and
leftmost (Perl / grep -P priority). Captures via
axl_regex_search_captures; compile flags CASELESS / MULTILINE
/ DOTALL; an ANCHORED match flag. The
axl_text_buffer_find_regex / axl_piece_tree_find_regex wrappers
run a compiled regex over those sources.
Regular-expression search — a compiled-pattern matcher over the axl-find.h byte-source seam.
A compiled pattern (AxlRegex): parse and compile once, then search many times — the right shape for find-all loops and matching one pattern across many lines or buffers. The matcher reads through the same AxlByteReader the literal find engine uses, so the same AxlRegex searches a flat buffer, an AxlTextBuffer, or an AxlPieceTree; a successful search reports the same AxlMatch.
The engine is a Thompson NFA / Pike VM, NOT a backtracker, so match time is O(pattern × input) for every pattern — there is no catastrophic (“ReDoS”) blowup. That guarantee is why backreferences are deliberately unsupported: they are not regular and would force backtracking.
Supported syntax: literals; .; greedy and lazy quantifiers * + ? *? +? ??; bounded repetition {n} / {n,} / {n,m} / {,m} (counts clamp to 1024; a { that is not a valid interval is a literal); anchors ^ $; alternation |; grouping and capture ( ); character classes [...] / [^...] with a-z ranges; the escapes \d \w \s \D \W \S \n \t \r \f \v and backslash-escaped metacharacters. Bounded repetition desugars to the base quantifiers, so a capture group repeated by an interval resolves to its last match. Named groups are not in this version. Matching is byte-oriented and leftmost (Perl / grep -P priority, not POSIX leftmost-longest).
AXL_AUTOPTR(AxlRegex) re = axl_regex_new("[0-9]+", AXL_REGEX_DEFAULT);
AxlMatch m;
if (re && axl_regex_search_buf(re, text, len, 0, AXL_REGEX_MATCH_DEFAULT, &m))
printf("first number at %zu, %zu bytes\n", m.start, m.length);
Defines
-
AXL_REGEX_NO_MATCH
Sentinel
startfor a capture group that did not participate.
Typedefs
Enums
-
enum AxlRegexFlags
Compile-time options (affect how the pattern is interpreted).
Values:
-
enumerator AXL_REGEX_DEFAULT
-
enumerator AXL_REGEX_CASELESS
ASCII case-insensitive matching.
-
enumerator AXL_REGEX_MULTILINE
^/$also match at\nboundaries
-
enumerator AXL_REGEX_DOTALL
.also matches\n
-
enumerator AXL_REGEX_DEFAULT
-
enum AxlRegexMatchFlags
Match-time options, passed per search.
Values:
-
enumerator AXL_REGEX_MATCH_DEFAULT
-
enumerator AXL_REGEX_MATCH_ANCHORED
match must start exactly at
from_offset
-
enumerator AXL_REGEX_MATCH_NOTBOL
Treat
from_offsetas MID-stream, not the start of the text:^(and the start-anchor) does NOT match atfrom_offset(POSIXREG_NOTBOL). Multiline^after an embedded\nstill matches. For chunked / windowed scanning of a larger source: pass it on every window except the one that begins at the true source start.
-
enumerator AXL_REGEX_MATCH_NOTEOL
Treat the buffer end as MID-stream, not the end of the text:
$(and the end-anchor) does NOT match atlen(POSIXREG_NOTEOL). Multiline$before an embedded\nstill matches. For windowed scanning: pass it on every window except the one ending at true EOF.
-
enumerator AXL_REGEX_MATCH_DEFAULT
Functions
-
AxlRegex *axl_regex_new(const char *pattern, uint32_t flags)
Compile a pattern.
- Parameters:
pattern – the regular-expression text (UTF-8 bytes)
flags – AxlRegexFlags
- Returns:
a compiled regex (free with axl_regex_free), or NULL if the pattern is malformed (the reason is logged) or on allocation failure. Use axl_regex_new_full to recover the error detail.
-
AxlRegex *axl_regex_new_full(const char *pattern, uint32_t flags, AxlRegexError *err)
Compile a pattern, reporting compile-error detail.
Identical to axl_regex_new but, on a malformed pattern, fills
err(when non-NULL) with the byte offset and a static reason string.erris left untouched on success and on allocation failure.- Parameters:
pattern – the regular-expression text
flags – AxlRegexFlags
err – [out] error detail on failure, or NULL
- Returns:
a compiled regex, or NULL (see axl_regex_new).
-
void axl_regex_free(AxlRegex *re)
Free a compiled regex. NULL-safe.
- Parameters:
re – compiled regex (may be NULL)
-
size_t axl_regex_capture_count(const AxlRegex *re)
Number of capture groups in the pattern (excluding group 0).
Group 0 is always the overall match; this counts the
( )groups, which are numbered 1..N left-to-right by opening parenthesis. Use it to size thegroupsarray for axl_regex_search_captures.- Parameters:
re – compiled regex
- Returns:
the capture-group count (0 if
reis NULL).
-
bool axl_regex_search(const AxlRegex *re, const AxlByteReader *reader, size_t from_offset, uint32_t match_flags, AxlMatch *out)
Find the leftmost match in a byte source.
Scans
readerfromfrom_offsetand reports the leftmost match (start >=from_offset). With AXL_REGEX_MATCH_ANCHORED the match must begin exactly atfrom_offset. Iterate by re-searching fromm.start + (m.length ? m.length : 1).The matcher needs a contiguous view of the scanned region: it uses the reader’s zero-copy
peekwhen available, otherwise it reads the region[from_offset, length)into a temporary buffer (so for a non-contiguous out-of-core source the cost is O(region) memory per call — fine for editor-sized regions; windowed matching is future work).- Parameters:
re – compiled regex
reader – byte source to scan
from_offset – where to start scanning
match_flags – AxlRegexMatchFlags
out – [out] overall match on success
- Returns:
true and fills
outon a match; false if no match (or any required argument is NULL).
-
bool axl_regex_search_buf(const AxlRegex *re, const void *data, size_t len, size_t from_offset, uint32_t match_flags, AxlMatch *out)
Find the leftmost match in a contiguous buffer.
Convenience over axl_regex_search for the common contiguous case; the bytes are borrowed and need not outlive the call.
- Parameters:
re – compiled regex
data – bytes to scan
len – number of bytes
from_offset – where to start scanning
match_flags – AxlRegexMatchFlags
out – [out] overall match on success
- Returns:
true and fills
outon a match; false otherwise.
-
bool axl_regex_search_captures(const AxlRegex *re, const AxlByteReader *reader, size_t from_offset, uint32_t match_flags, AxlMatch *groups, size_t n_groups)
Find the leftmost match and report capture groups.
Like axl_regex_search, but also fills
groupswith the captured sub-matches:groups[0]is the overall match andgroups[k]is the k-th( )group. At mostn_groupsentries are written (size it via axl_regex_capture_count + 1). A group that did not participate in the match (e.g. an unmatched optional or alternation branch) is reported withstart == AXL_REGEX_NO_MATCHandlength == 0.- Parameters:
re – compiled regex
reader – byte source to scan
from_offset – where to start scanning
match_flags – AxlRegexMatchFlags
groups – [out] group array (groups[0] = overall)
n_groups – capacity of
groups
- Returns:
true and fills
groupson a match; false otherwise.
-
struct AxlRegexError
- #include <axl-regex.h>
Detail for a failed compile (optional out param of axl_regex_new_full).