ScanFile

ScanFile is the main module supervising the search. It accepts the RE specification from GGrep, splits out the easier bit to scan first if available, and then prepares to search each RE (using STBM and/or the Grouse FSA as appropriate). The module also oversees the file search, requesting memory-buffered file input from FastFile, invoking the search engine as required on each buffer, and managing search reporting, including handling output if inverted match sense is selected.

The implementation of this module is a bit of a mess. Most noticeably, while just about everything else in Grouse Grep is reentrant, this module most certainly isn't. (To be fair, perhaps the large search context structure in MatchEng could be reworked as well, sigh.) Currently, buffer offsets are described using 32-bit integers, which limits offset reporting for very large files.

An interesting aspect of the search is the role of fast scans and slower matches: ScanFile, like the [Self-] Tuned Boyer Moore algorithms and the state table search, splits the search effort into a fast scan and a slower match component if feasible, based on the EasiestFirst analysis by RegExp. This scan/match separation seems to be a universal feature of high-speed search algorithms.

ScanFile also handles directory recursion, although it doesn't cover the full set of skip and read-as-binary options supported by GNU Grep. [Most ironically, I wasn't going to include recursion, until my private e-mail to the author ended up being published as the letter "Grepping and Globbing" in the September 1999 Dr. Dobb's Journal Letters to the Editor. In this message, I noted how the DOS version of GGrep supported file globbing and directory recursion.]

ScanFile's performance could improve if the search management was a little more sophisticated: For example, the word edge tests from the -w switch could be handled separately and more directly instead of being appended to the RE being searched.

Public routines:

Init -- Prepare module for operation

Start -- Begin managing what has to be managed

OutputFunctions -- Specify functions to perform match output

MatchFunction -- Define routine to perform match

Pattern -- Specify RE to be searched

Configure -- Define how the module searches and reports matches

Search -- Perform specified search on a file

MatchedAny -- Report if any files matched search criteria

TraceryLink -- Tell Tracery how to deal with us

Private routines:

NewScanContext -- Prepare blank scan context block

MatchedAbandon -- Halt search if matching line found

Open -- Prepare file for scanning

ExpandNames -- Build a list of all files in a directory

RecurseDir -- Enumerate and search files in directory

DisplayBlock -- Display block of lines (for inverted match)

SearchBuffer -- Search one buffer of file

NoMatchFunction -- Place-holder to warn of incorrect config