MatchGCC

In the DOS version of Grouse Grep, the search could be executed either by MatchC (written in C) or by MatchI86 (lovingly handcrafted Intel assembly). This module, MatchGCC, was intended to sit alongside the other two. However, as the actions in the match engine became more sophisticated (especially CHECK_BUFFER), it became impractical to maintain in an ANSI C version. Also, the need for a handcrafted assembly version was reduced once the Perl script was created to optimise the output of GCC. Therefore, MatchGCC has become the only implementation of the match engine. I'm still interested to know whether a handcrafted assembly routine could perform better, but for the moment, the flexibility of MatchGCC is more valuable than any potential performance improvement.

The first part of the Match routine writes the entry points for each action to the caller if the routine is called with a NULL text pointer. This is needed so that the correct values for the threaded assembly can be used in the state machine. The Perl script also looks for this block of code at the top of the routine, as it needs to find out which labels in the code map to finite-state machine actions. The actions themselves are defined by MatchEng.

The actions in Match are a little cryptic to read in isolation. The best way to understand the actions is to use the -C and -D switches to observe the compiled RE codes and the generated state tables, then dig into the fairly large and daunting RETable to see how the tables are generated, perhaps using the -O switch to observe how the tables look before and after various optimisations. Once you've got a feel for how the state table selection and the action selection interact, you can see better how the details of each action's implementation fits into the machine's operation.

Public routines:

Init -- Prepare module for operation

Match -- Perform search and report success

TraceryLink -- Tell Tracery how to deal with us