Release 53  Last change: 970506   Last Document Update: 970506

DG                   USER COMMANDS                       DG

*
NAME
     dg - Data-Grep.  Like grep but for searching a free-form
     flatfile database, printing the entire records rather than
     just the lines containing the searched-for phrase.
     Pronounced "dig" as in "digging out data."

*
SYNOPSIS
     dg [-options] srchstrng infile 
        srchstring is optional (& ignored) with -s, -l, -U options
        srchstring is optional (not ignored) with -a, -A options
        srchstring may be multiple with -m, -M, -y options
        srchstring is disallowed with the numeric (-n) option
        srchstring may be a file- list-of-searchterms - with -f option
        srchstring may be a file- list-of-updates - with -B option
        infile may not be wildcarded unless using dgw batch file (below)

*
DESCRIPTION

dg will search a text file for a given phrase and print all "records" 
containing that phrase to standard output. 

dg is intended for free form "flat" files of text containing records 
(multi-line chunks or "paragraphs") separated by a defined delimiter 
character (default is "*").  The delimiter character must occur at the 
beginning of a line (but see options). Normally, no useful data should 
be present ON the delimiter line, as it would be lost on output except 
under certain options. 

A "paragraph" mode (-dd option) treats blank lines as record 
delimiters. 

Unlike grep, which would report only the specific lines containing the 
search term (or a fixed number of lines either side of the find), 
data-grepper will print the entire record in which the search term was 
found-- or a specified number of key lines for that record. 

The records of the data file are read in order, and records with 
"hits" are sent to standard output.  Maximum line length is 200 
(500 in UNIX versions). Overlong lines on input are tolerated, 
but split to meet max line length.  dg has no limitations as to 
record length, but the price paid is that it cannot accept standard 
input from a pipe (the input file is opened twice). 

The program does not directly support wildcards, nor does it 
understand all unix "regular expressions."  A wildcarded list of files 
to search may be done using the dgw batch file. 

The searchstring is normally used literally on the command line, or, 
by using the -f option, up to 100 searchstrings may be specified in a 
separate file (100-DOS, 1000-UNIX).  Searchstring length is the same 
as maximum linelength except when using the -f option, where 
searchstring length is limited to 20. 

        dg  searchterm  file_to_search

        dg -f file_of_searchterms file_to_search

If the searchterm contains spaces, the searchterm must be enclosed in 
single (UNIX) or double (DOS) quotes when used on the command line. 

Depending on options selected, the first 1-9 lines of each record may 
be treated as key lines.  Either the search or the report or both may 
be limited to these lines. 

        dg  -k3K5 searchterm  file_to_search

        will look for searchterm only in the first 3 lines of 
        each record, and print the first 5 lines of records
        when a match is found.

The output normally retains the "*" delimiters, thus becoming a subset 
of the original data file, ready for further searches. 

Other single characters can be used as delimiters by specifying a new 
delimiter on the command line. 

        dg  -d# searchterm  file_to_search

        will expect the # sign as record delimiter.

The special case ("paragraph mode"):

        dg  -dd searchterm  file_to_search

        will treat blank lines as record delimiters.


There is an option for additional, secondary sub-delimiters (which may 
be blank lines) if your data records are large enough to warrant this. 

Null records, those with only newlines between successive delimiter 
lines, are ignored and will not be present on output.  

Records with whitespace (spaces, tabs) are not treated as nulls. 

The original delimiter lines are normally a single "*" character, but 
may contain dashes or text following the delimiter character.  The 
extra characters are treated as "comments" NOT to be printed on output 
(unless the -R option is named).  Indeed, a whole series of delimiter-
prefixed lines may be included in the master file as "comments" or 
documentation, not to be printed on output. 

---------------------------------------------------------------------------
*
OPTIONS  (in order of average overall usefulness)


        -C         Case sensitive  (default is case insensitive).

        -c         Count only.  Report the number of records 
                   containing the search phrase.

        -v         inVert sense.  Report records not containing 
                   the search phrase.

        -d$        Use $ or other following char as Delimiter
                   Exception:  Use -dd (yes - lower case d repeated)
                   and the system will treat a blank line as the
                   delimiter for search (sort of like considering
                   paragraphs as records).  Output will, however,
                   insert the standard "*" delimiter.

        -dd        Paragraph mode. Blank lines are record Delimiters.
                   True blank lines only- no spaces or tabs.
                   See -d above.

        -k[n]      Search in Keyword-lines-only. Declare the first n (1-9)
                   lines as keyword-lines.  Default 1st line only.

        -K[n]      Limit output report to Keyword-lines-only.
                   The first n (1-9) lines are Keyword-lines.  
                   Default 1st line only.  n for -k and -K
                   options limited to 1 digit.  They are 
                   independent, and can be used together.

        -s         Status of data file: give record count only.
                   If used with -V option, reports misc. file data.
                   Ignores null records. (Reports them if verbose).
                   (overrides other options on the command line)

        -l         Treat aLL records as hits. No searchterm needed.
                   Useful with -D, -a, -A, -K options.
                   dg -Kpl will give an undelimited list of
                   first (key) lines. dg -Kplh# gives a similar
                   list of major/minor keylines for files with
                   major (*) records and sub(#)records. 
                   Note- Other uses of -l and -h together are
                   not recommended.

        -w         Match only on Words. (phrase bound by spaces 
                   or line boundary or any non-alpha, non-digit)
                   Underscore (_) is treated as part of a word.

        -u         Like -w but Underscore (_) is treated as a 
                   word delimiter (as if whitespace) as well.

        -x         Search phrase is found even if it crosses over 
                   a line boundary (X-over). One-line crossover 
                   only. Ignores trailing but not leading 
                   spaces on lines.  Best when used with -T 
                   to ignore leading spaces too.  Note: trailing
                   hyphens are also ignored so that normal word
                   hyphenation is dealt with.

        -f         Get searchstrings from File.  
                   Use filename to replace search phrase on the 
                   command line.  Leading and trailing spaces in
                   the file of phrases are stripped. For DOS, number of 
                   searchstrings in the file is limited to 100 
                   *and* a 20 character string size limit is imposed.
                   Finds are reported in the order they occur in the 
                   data file, not the order of the file of terms. 
                   (use batch files/unix scripts if you must extract 
                   records in other than data file order.) 

        -F[n]      The searchterm must be found in Field n of a
                   line to be considered a hit.  Incompatible
                   with -mMxnUD and ^$ usage.  A field of a line
                   is defined as in a default awk usage-- words or 
                   terms separated by whitespace, with leading/trailing 
                   whitespace ignored.  Use -F with no numerics to 
                   indicate the last field of a line regardless of
                   the number of fields there.  See extended discussion
                   below.

        -L         Affects -F option. Lax enforcement of field numbers
                   and lengths.  See extended discussion below.

        -m[n]      Expect n Multiple search terms on the command line,
                   each of which must be present on_a_single_line in a
                   record to cause a find. If n is omitted, n=2.
                   Incompatible with -v.  Max n is 9. If the searchterms
                   are identical, 1 hit suffices.  If used with -x,
                   finds must be within about 1 line of each other.

        -M[n]      Expect n Multiple search terms on the command line,
                   each of which must be present somewhere_in_the_record 
                   to cause a find. If n is omitted, n=2.
                   Incompatible with -v.  Max n is 9.

        -E[n]      Look for Extras-- expect 1 search term on the command
                   line, and report records having that term on at 
                   least n separate lines. If n omitted, n=2.
                   Incompatible with -v.  Looks for Extras.

        -p         Plain output.  Do not print the delimiter on output.
                   Exception: with -y, kills only the sub-separator line.

        -e         Exact whole-line match required to cause a find.

        -T         Ignore L & R (lead/Trail) spaces on all lines. 
                   Useful with  -e or -x                                        

        -Q         Quit on first find of term; on first find of *each* term
                   when used with -f.  Useful with files that redundantly
                   repeat records, e.g. expanded procedural flows. If more
                   than one -f term is found in a record, all are satisfied
                   by printing that record.  Do not confuse this with 
                   -m or -M searches.  The -Q option then will quit on the
                   first find satisfying the -m or -M condition.

        -W         Print only the record numbers where the finds occur.
                   ("Which" records?)

        -h$        Use $ or other following char as an added, secondary 
                   "Helper" delimiter. The secondary delimiter will be 
                   recognized whether in the first or second position 
                   on a line.  Output will be preceded by the first 
                   line of the main record, and the phrase: "PARTIAL 
                   RECORD:"  Not compatible with -x option.
                   Exception:  Use -hh or terminal -h with no character
                   specified, and the system will treat a virtual blank 
                   line (true blank lines, or lines with only spaces/tabs)
                   as a secondary delimiter for search (sort of like 
                   considering paragraphs as sub-records within explicitly
                   delimited records).  Output will, however, insert the 
                   standard "*" delimiter. 
                   Example:  dg -hhCF1 -h dgman
                   will give help on the -h option of dg.

        -Dfname    Divide(distribute) output: 
                   Write records found to files fname0001, fname0002...
                   one file for each find. Supported ONLY as last option
                   in the option list.  Limited to 9999 output files.

        -n#[...]   Get record by Number, e.g. -n456 = get 456th record
                   Compatible ONLY with -vKqod$...  NOT with -aA
                   Null records are ignored when counting.
                   Supported ONLY as last option in the option list.
                   A syntax of -n#[#####],#[#####] is supported to retrieve 
                   a range of record numbers.  Particularly useful
                   when a large file must be divided.

        -a         Print whole data file, Append contents of zzapfile 
                   to finds. See discussion below: UPDATING RECORD STATUS

        -A         Print whole data file, Append zzapfile line 1 to 
                   keyline 1 of finds. See discussion below: UPDATING 
                   RECORD STATUS

        -j         Affects -a, -A options- don't print whole 
                   file, but Just the records with finds.

        -J         Affects -a, -A options- tacks a "Jumped" record 
                   number onto "found" records.

        -r         Print the delimiter followed by dashes (like a
                   Ruler line) to enhance visual separation of records. 

        -R         Retain content of original delimiter lines.
                   The default is to drop additional characters
                   following the delimiter. (The default permits
                   the delimiter line to contain "private" 
                   file documentation.)

        -B         Fold-in Big data updates. Allows automated updates
                   of large record sets based on a file of update
                   directions.  Highly useful but only in limited
                   circumstances.  See discussion below.

        -U         Uniqify a set of records. Directs deletion of 
                   repeated records based solely on the last field of 
                   the first keyline.  Limited filesizes except in
                   UNIX versions. See extended discussion below. 

        -V         Verbose. Show prefatory/summary remarks.  Use 
                   with -s for datafile status report. Use -Vq with
                   dgw batch file to record filenames searched.

        -H         Emphasize the line in the record where the
                   search conditions were met. Prints markers
                   (happy faces if in DOS) at beginning of the 
                   "Highlighted" line.  Seldom needed, but can be 
                   helpful when individual records are long.

        -N         Print a Negative message if no records are found.
                   Normally, there is no output when there are no 
                   finds.

        -o         Null argument. Does nOthing. Useful from 
                   some batch files/scripts. 

        -G         A Grep-like option. Only the lines with the
                   match are printed.  Use only if a real grep is 
                   unavailable. No REGEXP, but usable with the 
                   following options: 
                        -w, -u, -c, -v, -T, -C, -e, -f, -m, -F, -N, ^$
                   Not usable with -k,-K,-x,-D,-Q,-y
                   nor with most other options that are record-oriented.
                   Inappropriate options are not all trapped, but 
                   generally have no effect.

        -y         The grep-rest option. Unrelated to -G.
                   "digs" for a record, Yet greps it too.
                   Usable with -K such that IF a record is
                   a "hit" the -K keylines are printed, and
                   followed by any remaining lines in that record
                   that contain one of a set of other searchterms.
                   I.E.- print the keylines of finds and grep the
                   rest of the record for other searchterms.
                   The -m option and syntax must be used, but the
                   the FIRST term given in -m syntax becomes the 
                   SOLE record searchterm and all OTHER -m terms 
                   become what we grep for after the keylines.
                   Example:
    
                        dg -ykK2m3r gold melt boil elements
    
                   will print the 1st 2 keylines of records in the
                   file "elements" having "gold" in the 1st keyline,
                   and then print any remaining lines in the record 
                   having the terms "melt" or "boil".  Use with 
                   the -r option for best visual separation of 
                   resultant records.

        -I         Ignore delimiter if repeated in place 2.
                   i.e., if a line begins with ** then
                   Treat it as just a text line, not a delimiter line.
                   Useful with certain originals when you don't want
                   to clean them up first.

        -S         Add delimiters (Stars) to a file. A delimiter line is
                   added _before_ each line containing the search term.
                   Use -Sf and a file of searchterms when appropriate.

        -P         Add delimiters (Post-stars) to a file. Like -S, 
                   but delimiters are added _following_ each line 
                   that is a hit.

        -q         Quiet.  No extraneous prefatory/summary remarks
                   (default, but retained for historical reasons).
                   Exception: Use -Vq with dgw batch file to record 
                   filenames searched.

        -i[n]      Recognize an Indented delimiter anywhere in the first
                   n characters (1-9) of a line.  Useful in delimiting
                   code files when the delimiter must reside inside 
                   a comment, e.g., 
                   /* (c) , //* (c++) , #* (unix), ;* (lisp) , REM * (dos)
                   Especially useful with -T to kill leading whitespace
                   for files that have extensive indentation schemes.
                   Thus a -Ti option allows #* to work with any amount
                   of leading whitespace.

        -Z[Z][1]   FuZzy searches-- Look for approximate matches.     
                   The -Z option uses a SOUNDEX algorithm that assumes
                   the first letter of every word is unfuzzy. Use
                   the -ZZ option to fuzz even the first letter,
                   e.g., batter with a searchterm of "patter", but
                   expect lots of false hits.  A Z1 option uses a stem
                   algorithm that might find "silliness" when you 
                   search for "silly". All three fuzzy approaches 
                   are desperation moves, sometimes laughable.
                   You may need the -H option to figure out which
                   line caused the hit.  Expect "fuzzy" to be more
                   like "hairy" or even "wooly" most of the time.
       
                   The SOUNDEX approach is an old classic, which gives 
                   decent results when you must search with names or 
                   commonly misspelled words such as nuclear and 
                   personnel, but expect lots of extra drivel as well.  
                   Only the first few syllables are checked.  If you're 
                   curious, you can inspect the kind of coding produced 
                   for any searchterm by adding a -N option using a 
                   file you know will NOT produce a match.  The "not 
                   found" report will show the soundex or stem code of 
                   the searchterm.  Alternatively, add a -V verbose 
                   option and wade through the whole mess. 
       
                   Expect junk results if you use small searchterms, 
                   numeric searchterms, or searchterms that include 
                   spaces or punctuation. The -w option is disallowed.  
                   Although only words are really treated, there can be 
                   no guarantee of a true wordmatch.  The -e option is 
                   allowed, but a hit indicates exactness only in the 
                   coding string, not in the actual text. All fuzzy 
                   searches are automatically case-insensitive.

        -O         Show PrOgress-- when working very large files,
                   print some sign of life every 1000 lines 
                   to screen only.

        ^$         These are not command line options, but implied
                   options nonetheless.  Though full unix regular 
                   expressions are not supported, the ^ and $ 
                   expressions are:
     
                          dg ^foo filename
     
                   means look for "foo" at the beginning of a line.
     
                   Similarly:
                          foo$ means foo at the end of a line
                          \^foo means search for literal "^foo"
                          foo\$ means search for literal "foo$"
                          Note that a search for ^RAT$ is designed to
                             succeed on "RATCELLAR WITH RAT"
                          Use -e for the unix sense of ^RAT$
                             where the intent is SOL-phrase-EOL.


---------------------------------------------------------------------------
*
USAGE: General

This utility is not designed to replace full featured databases with 
formal query languages. It is suitable for keeping utility files, such 
as address or contact files or software requirements files, when the 
purpose of the search is not to settle just for individual lines 
containing the desired phrase, but to get the entire paragraph or 
record.  It is like grep with some notion of context. 

It is useful from the command line, but most powerful when used in 
batch files that grab a set of records and then do further processing 
on them. 

While dg is oriented to asterisk-delimited text files, any single-
character delimiter can be used, including blank lines. 

Given a data file of simple paragraphs separated by blank lines, dg 
can behave as if the blank lines were the "asterisk" delimiters: 

        dg -dd searchterm datafile

The output will be asterisk-delimited, unless you add the -p (plain) 
option. The blank lines must not have hidden spaces or tabs, unless 
you use the -T option (trim lead/trail spaces/tabs) option as well. 

----------------------------------------------------------------------
*
USAGE: Null Records

Null Records:
        A record is "null" if it has no bytes or only line-ending
        bytes. Null records are ignored for output, and when 
        counting to find an Nth record.

Null Keyword lines.
        The -sV option will report records that have no data on
        keyword lines.

----------------------------------------------------------------------
*
USAGE: With awk and grep in scripts

dg it was originally designed to work in concert with awk in scripts 
or batch files-- working against initially unformatted text files. 

        An aside:  if you are not familiar with awk, you are
        missing one of the best tools available for manipulating
        text files.  Get a copy of Rob Duff's awk or the GNU
        gawk for DOS.  It has almost all the power of PERL,
        but when you read an awk scipt six months after you've
        written it, you'll understand it.  PERL is best only for
        folks who will use it every day.
        
Given a master file of records without explicit delimiters, an easily 
designed awk script can place delimiters at appropriate places in a 
temporary copy of the original file using either a simple or fairly 
sophisticated set of guidelines.  dg is then used to do searches on 
the temporary file.  If the master is updated, the awk script is re-
run to update the temporary file. 

The dg -S or -P option can be used instead of an awk for very simple 
cases. Or-- if you simply want to trade blank lines for "*" delimiters, 
use a -ddl option; the output will have "*" delimiters where the blank 
lines were.

If your primary data is in a commercial database, you may find it 
useful to dump a subset of the database to a delimited ASCII file.  
Then, for the rest of the day, you can dig at it with dg, directly or 
from scripts, without needing to keep the (potentially memory-hogging 
or licensed-user-limited) database software running. 

----------------------------------------------------------------------
*
USAGE: If records have labeled lines

dg is powerful when used with grep against data files designed to have 
a number of labeled lines or "slots" in each record. 

With a file such as:

        NAME:   John Jones
        PHONE:  999-9999
        UNIT:   T-44
        EXPER:  C, C++, aerodynamics
        ASSIGN: Rufus GUI modules
        DUE:    JAN 95
        *
        NAME:   Jane Smith
        PHONE:  999-8888
        UNIT:   T-55
        EXPER:  LISP, scheduling, traffic flow, NL
        ASSIGN: Rufus NL interface
        DUE:    FEB 95
        *

a command line or script call such as:

        dg smith filename | grep EXPER 

would yield:

        EXPER:  LISP, scheduling, traffic flow, NL

or--
        
        dg -ykKm3 smith exper assign filename

would yield:

        NAME:   Jane Smith
        ...................
        EXPER:  LISP, scheduling, traffic flow, NL
        ASSIGN: Rufus NL interface


whereas:

        dg smith filename

alone would print smith's entire record.

----------------------------------------------------------------------
*
USAGE:  AND searches

One can use a script that searches for records with the first term, 
redirecting output to a temporary file-- which is then searched for 
records with the second term 

        dg  phrase1 filename > temp
        dg  phrase2 temp

Alternatively, use the -M option:

        dg  -M  phrase1 phrase2 filename
        dg  -M4 phrase1 phrase2 phrase3 phrase4 filename

If the search is intended to "AND" multiple phrases on a _single_ line, 
use the -m option.  This is particularly useful when, e.g., you want 
to find records containing "DEC" but only if the "DEC" is on a 
HARDWARE line and not on a MONTH or DATE line. 

                  dg -m DEC HARDWARE filename

----------------------------------------------------------------------
*
USAGE:  OR searches

Put the set of searchterms if a file and use the -f option.

----------------------------------------------------------------------
*
USAGE: Searching Multiple Files

There is no provision for wildcards in the datafile name.

Each datafile must be searched individually.  
         (Use awk to create a script that calls dg against 
          each of a list of datafiles.)

Alternatively, use the following batch file or its unix equivalent.
Note that the results are always written to a file named ztempx,
in the current working directory.  The batch file will complain if
you try to search ALL (*.* or *) files in the current working
directory since that would include the output ztempx file.
If you must search ALL files in a directory (* or *.*), do so from
a higher level directory.  dgw -o srchterm asubdir/*.*
will work fine.

The dgw output will not name the file where finds are found,
unless you include a -Vq option.  If you do that, dg will 
insert a record naming each file searched.  To clean out those
advisories, just use dg again with dg -kv "Searching file:" ztempx
to get a "clean" set of results.

    ---------------------cut here -----------------------------------
    @ECHO OFF
    rem bat file to use dg with wildcarded list-of-files-to-search
    rem usage dgw -dg_arguments searchterm [*.txt or ad.* etc.] 
    rem note: a dg argument must be given, at least -o (a do-nothing)
    rem note: always writes result to file named ztempx (and sends to more)
    rem note: thus ztempx must not be in the scope of the wildcard
    
    IF "%1" == "" GOTO helps
    IF "%3" == "" GOTO error
    ECHO ======Executing the command dg %1 %2 %3 %4 %5 %6
    ECHO ======Will overwrite ztempx
    ECHO ======RETURN to continue, ctrl-C to quit
    PAUSE
    IF EXIST ztempx  DEL ztempx>NUL
    rem if touch program not available, use: @REM redirect_to ztempx
    rem touch ztempx
    @REM >ztempx
    
    rem current setup for 6 total args: supports, e.g., up to -m3
    rem its possible to send 11 arguments with, e.g., -m9
    rem expand below to handle 11 if desired
    if exist %6 goto six
    if exist %5 goto five
    if exist %4 goto four
    if exist %3 goto three
    
    :six
    FOR %%X IN (%6) do if %%X==ZTEMPX goto scope
    FOR %%X IN (%6) DO COMMAND/C dg %1 %2 %3 %4 %5 %%X >> ztempx
    goto didsearch
    :five
    FOR %%X IN (%5) do if %%X==ZTEMPX goto scope
    FOR %%X IN (%5) DO COMMAND/C dg %1 %2 %3 %4 %%X >> ztempx
    goto didsearch
    :four
    FOR %%X IN (%4) do if %%X==ZTEMPX goto scope
    FOR %%X IN (%4) DO COMMAND/C dg %1 %2 %3 %%X >> ztempx
    goto didsearch
    :three
    FOR %%X IN (%3) do if %%X==ZTEMPX goto scope
    FOR %%X IN (%3) DO COMMAND/C dg %1 %2 %%X >> ztempx
    goto didsearch
    
    :didsearch
    
    echo ================ FINDS: ===================================
    TYPE ztempx | more
    ECHO =========== Finds placed in file ztempx ===================
    GOTO end
    :scope
    echo ERROR- the wildcard term includes the output file "ztempx"
    goto paterror
    :error
    ECHO   dgw error.
    :helps
    ECHO   dgw is used to do a dg-search against a wildcard list-of-files.
    ECHO   e.g. " dgw -Kp searchterm *.foo " 
    ECHO   A dg argument must be used. Use -o for a do-nothing argument.
    ECHO   e.g. " dgw -o  searchterm *.txt " 
    :paterror
    ECHO   The output of each search is written to file "ztempx"
    ECHO   Be sure that the wildcard term cannot "see" the file tempx
    ECHO   NAME.* or *.NAM is ok. But be in a separate directory to use * or *.*
    ECHO   e.g., NOT "dgw -o  searchterm *.*" NOR "dgw -o  searchterm *"
    ECHO   e.g., BUT "dgw -o  searchterm subdir/*.*" will work.
    end

    ---------------------cut here -----------------------------------

----------------------------------------------------------------------
*
USAGE: Creating Tailored Data Sets from A Master

I need to maintain a large set of test datasets.  For the actual test, 
each must be an individual file, but maintenance is much easier if 
they are all kept in a single master file.  Each message is delimited. 
At test run, a script executes dg with the -D option, creating the 
individual files of the targeted datasets, before executing the actual 
tests that will act on the individual files. 

All datasets include one or more keywords such as "full", others 
"fullminus", and others "specialcase4"; the keywords indicate the 
class of test. Depending on need, a dg for the desired keyword 
produces the tailored test set files. 

----------------------------------------------------------------------
*
USAGE: Understanding the Multiple Terms Options ( -m, -M, -E, -y )

These options can be confusing, but each has been a lifesaver at one 
time or another.  These examples may help: 

     dg -m  foo fum     filename - a hit if foo & fum on a single line
     dg -m3 foo fum fay filename - a hit if all 3 on a single line
     dg -M3 foo fum fay filename - a hit if all 3 anywhere in a record
     dg -E3 foo filename         - a hit if foo is on at least 
                                     3 separate lines in a record 

The following are not useful searches, but they help explain the 
behavior when searchterms overlap: 

     dg -M3 foo foo foo filename - will succeed if 1 foo in a record
     dg -m3 foo foo foo filename - will succeed if 1 foo in a line

The -y option needs an assist from the -m option in meeting the
command line syntax, but the meaning of terms and behavior are very
different.  Also, the -y option may be used only with a -K option.

     dg -yKm3 foo fum fay filename

For this case, "foo" becomes the sole searchterm determining
whether a record is a hit.  For such records, the first keyline
is printed (-K), and for the remainder of the record, any lines
containing "fum" or "fay" will be printed.  The "m" in the options
is used only to bring in the searchterms and then its "normal" meaning
is ignored.

----------------------------------------------------------------------
*
USAGE: Eliminating Dupes In Multiply Appended Results Files.

The -U option is generally useful only if you intend to search a large 
set of records several times, appending each result to a collection 
file.  Naturally this kind of job can result in a final file with 
quite a few duplicate records. 

To avoid this, first run dg against the master file (or a copy therof) 
with an -aJ option to append a record number to the first keyline of 
each master record. 

Then run your multiple searches against this modified master, 
appending the results of all searches to the collection file.  
Finally, run dg with a  -U option to create a uniq'd final version. 

This option needs to build an array of last-fields "already seen." To 
limit memory poblems in DOS, no one "last-field" may exceed 10 
characters in length, and the total record size to be culled may not 
exceed 200 records.  A simple awk can do the job for tougher cases.

----------------------------------------------------------------------
*
USAGE: The -F Field Option

The field option is only rarely of use, but very powerful when needed. 

This option allows you to limit search actions to specific fields of a 
line.  E.G., consider the command: 

        dg -F12k3 Elizabeth records

The F12 indicates that only field 12 of any line should be searched for 
the searchterm.  The k3 would further limit the search to only the 
first 3 lines of any record. 

A field is defined like a default "field" in awk-- words or terms 
separated by whitespace, with leading/trailing whitespace ignored. 

The -F option requires some limitations on the the maximum field 
length and maximum number of fields per line. For DOS, these limits 
are 40 and 20. That is, no one field with a length over 40, nor more 
than 20 "words" in any one line. 

The option is designed primarily for files in which ALL lines stay 
within these limits.  Any field exceeding the max length will be 
truncated to "fit" and a warning posted to the screen.  Any one line 
exceeding the max number of fields will cause an error warning and the 
program will terminate. 

This behavior can help detect unintended errors in the way the data 
file was created-- if it was your intent to stay within the limits 
given. 

For other cases, you may intend that only certain lines will be 
"fielded" lines, and others should not be restricted.  Use the -L 
"lax" option to kill the complaints (-LF).  Any field past the 20th will 
just be ignored. Fields exceeding max length are quietly truncated. 

If -F is used with no numeric attached, the program assumes you intend 
to search the LAST field of the line.  Behavior for this special case 
will be correct even if the normal max number of fields is exceeded.  
Total line length limits will still apply. 


----------------------------------------------------------------------
*
USAGE: Updating Record Status:

If there is a need to append one or more new lines to selected records 
in a master file: 

        --   Put the append text in a file named zzapfile
        --   Run dg with the -a option.
        --   The entire file will be sent to stdout with the
                append text appended to records matching the 
                search text.            

If there is a need to append a phrase to the main key-word line of 
selected records: 

        --   As above, but use the -A option.
        --   The contents of line 1 of the zzapfile
                will be appended to the 1st line of records 
                matching the search text.

Use the -J option along with the -a or -A options to append as 
described, but inhibit printing of records that do not have a match. 


----------------------------------------------------------------------
*
USAGE: Updating Records with the -B option

The -B option allows one to update certain kinds of record files from 
a manually (or otherwise) produced update file.  This is usable only 
with files that use line names at the start of each line. 

Such a file, the tgtfile, might hold records such as:

     john smith 
     title: staff engineer
     ssn: 999-999-9999
     salary: 44444
     hired: 960506
     *

If one creates an update file, foldfile, such as:

     ann smith ~salary: 45555
     john smith ~salary: 77777
     pat kelly  ~salary: 33333
     john smith ~ssn: 888-888-8888 

Then the command:

     dg -kB foldfile tgtfile > zz

would update john (and ann's and pat's) salaries, as well as john's 
ssn, leaving other records untouched. If john  did NOT have a salary 
line, it would be appended as a new line in his record. You can update 
multiple elements about john from a single update file.  (The updated 
results are in file zz; dg never changes the original record file.) 

The option assumes you will use unique key terms that will be found 
only once in the designated number of keylines. For example, an 
update file such as 

     smith ~salary: 45555
     john ~ssn: 777-777-7777 
     pat ~salary: 33333 

will update john smith's ssn or his salary but not both.  If "john 
smith" were used instead, both lines would be updated. 

This is important to understand, especially if you tell the search to
continue over more than one keyline (by using, e.g., -k3).  If a valid 
hit is found in, say, keyline 2, then actions will be taken based on 
that hit-- and only based on the first hit in that line.  If you 
expected additional actions based on a second possible hit in keyline 
2 -- or a separate hit in keyline 3-- you will be disappointed.  
The search looks no further than the first hit.

The line title (e.g., ~salary) is always case sensitive.  Use of the 
-C option will make searches for the key term (e.g., john smith) also 
case sensitive. 

Note that line title in the update line should be identical to the 
one that is used in the file of records if you want to preserve the 
original line name. Otherwise the updated record will take on the 
line title provided in the update file. For example, john ~STATUS OK 
-- will find and replace "STATUS----: BAD", but the new line would be 
"STATUS OK" not "STATUS----: OK". 

Limitations:  The file will look for "john" only in the keylines 
specified. You must use the -k[] option with the -B to designate how 
many keylines are to be searched. A maximum of 20 data elements about 
john can be used in the file of updates. Data in keyline 1 cannot be 
changed.

In general, don't try to overwork this option.  Its fine for limited
cases. For more complex work, use awk.
----------------------------------------------------------------------
*
USAGE: Non-ASCII Documents

dg works ONLY with ASCII files. If you are keeping your master records 
using document publication software, files saved will normally be 
in other than pure ASCII.  All need not be lost.  Most such software 
allows saving a pure ascii version as well.

I've had to keep quite a few documents in ready-to-publish form
using Framemaker or Interleaf.  Whenever edits are made, I simply 
create an extra, updated ASCII version as well, either directly or 
using an awk to strip out the formatting and graphics in the extra 
copy. 

----------------------------------------------------------------------
*
USAGE: Help

Typing dg with no arguments provides some cryptic help.

Use the dgh.bat batch file or an equivalent unix script to get a bit better 
help on a particular option.  Replace the "\dg\dgman" with your own path to 
the dgman file. 

With this batch file in your path, enter:

    dgh -y

to see the part of dgman that describes the -y option.

ALternatively, use the dghh.bat file (or equivalent) to get help using
a likely keyword:

    dghh dash

will show help on the -r ruler line  option.  Replace the "\dg\dgops.sam"
with your own path to that file.

Note: The dgman and dgops.sam files contain embedded spaces on certain 
apparently blank lines to keep certain help sections together when using
the -dd option.  For example, see the help for -y. 
----------------------------------------------------------------------
*
USAGE: Option Confusion

You can come up with a lot of different option combinations using dg.
When you get some combination that does what you want, put it in a
batch file or an alias command.  Let the computer do the remembering.
The dg.bat file shown above is a good example of usage.

----------------------------------------------------------------------
*
BEHAVIOR:  Treatment of Punctuation

DOS BEHAVIOR:   
    In a searchterm, <>| must be quoted. The ; and " symbols can be in 
a searchterm only if using the -f option. A backslash (\) may be in a 
searchterm but must not immediately precede a double quote ("). The % 
symbol can be in a searchterm only if using the -f option or using the 
command line directly; from a DOS batch file, the % symbol would be 
lost. 


UNIX BEHAVIOR:
    Generally less silly. If you must include punctuation in a searchterm,
you may or may not need to use single quotes around the term.  Experiment.

----------------------------------------------------------------------
*
BEHAVIOR:  "WORDSEARCH" (-w)

A searchterm "hit" meets -w wordmatch criteria as long as the hit:
     -- is bound on left  by: SOL, non-alpha, non-digit, non-underscore
     -- is bound on right by: EOL, non-alpha, non-digit, non-underscore
         (except with -U option, where underscore IS treated as wordbreak)

A "word" bounded by punctuation remains a word. Thus !@#$wow#$&%  
will qualify in a wordsearch for "wow".  SOL means "Start of Line" and
EOL is the end.

ALSO-- digits or punctuation INSIDE the searchterm do not disqualify it 
as a "word." For example, "walla6*%^walla7" can be a word "hit" for 
the "word" "walla6*%^walla7" since wordmatch only checks the area left &
right of the "hit." 

----------------------------------------------------------------------
*
BEHAVIOR:  UNPRINTABLE CHARACTERS:

The program is not designed to deal with control characters nor high-
bit ASCII above 127 in the text nor the searchterms.  Consider the 
behavior unpredictable when these are present. 

----------------------------------------------------------------------
*
EXAMPLES

--- Random Files:
The simplest example uses are for keeping randomly organized address 
or contact records, system/software requirements statements, multi-
line quotations or references, mini-help files, to-do files, scheduled 
appointments, descriptions of hobby collectibles, recipes, or simply 
random ideas.  Just separate all "data chunks" with a "*" delimiter. 

--- Unordered ASCII Documents (Paragraph Mode):
Any documentation you reference often, or need to extract from, can be 
dg-searched to get the right "paragraphs" to standard output, even if 
the only delimiters are blank lines. 

        dg -dd phrase filename

        Will provide all paragraphs holding the phrase, and

        dg -vdd phrase filename

        Will provide all paragraphs NOT holding the phrase.


--- Master Documents with Summary Lists    (Advanced usage)
Often a master document of, say, software trouble reports, can be 
delimited and then searched for related topics.  The master may be a 
manually-maintained one or the ASCII result of a database query. 

        Item: A22
        Block:  Control
        Date:   940522
        Short_Title:    Foo fum
        Description:  This item is not performing correctly whenever 
        the input is made on a Tuesday before 2:15 PM.  Behavior normal
        at all other times.
        Supply Data:  Acme Co. and Ray-Bolixer Inc.
        Priority:     3
        Asssigned:    Joe
        Due           940610
        *
        Item:   A33
        etc.

        *


Perhaps you've created a priorities listfile from it, that you use to 
keep track of the big picture: 

Item    Block    Date    Short_Title     Priority    Assigned to   Due
A22     Control  940522  Foo_fum         3           Joe           940610
A33     Charts   940527  Fum_fee         2           Jane          940616
A44     A-Object 940528  Fee_fie         4           Joe           940622
A55     Control  940530  Fo_fum          1           Dan           940604

Assuming the Block designation is in one of the first, say 3, keylines 
of each record, try this: 

gawk '$2=="Control"{print "dg -k3 " $1 " > temp2.bat"}' listfile  > temp1.bat
call temp1  (creates temp2.bat)
call temp2

The awk creates a batch file of dg calls that will give you the 
details on the Control block problems.  Of course, if the records are 
in a full-fledged database, you could query it directly.  The dg 
approach is primarily of value in batch files/scripts-- especially if 
the data source is not worth entering or maintaining in a full fledged 
database system. 

----------------------------------------------------------------------
*
SEE ALSO
     grep, awk, sed

----------------------------------------------------------------------
*
BUGS & LIMITATIONS

Max line length in input file:           200 (500 in unix versions)
Max searchstring length:                 200 (500 in unix versions)
Max num of searchterms in -f searchfile: 100 (1000 in unix versions)
   Max searchstring length with -f:       20 (500 in unix versions)
Max number of fields for -F options       20 (100 in unix versions)
Max field length for -x, -F options       40 (100 in unix versions)
Maximum number of records 
   when killing dupes:                   200 (500 in unix versions)

Overlong lines on input are tolerated, but truncated as far as the 
search is concerned.  

Error management for insensible option combinations is provided only 
for the most common mismatches. 

LIMITATION: Use of a file of searchterms:

The -f option provides results only in the order that hits occur in 
the original data file.  Getting results in the order listed in the 
list of search terms could be useful. 

      Use the following awk & script approach as a workaround:

      Create the list of search terms as "srchtrms"
      Use of dg -f srchterms datafile would give
         results in the order of the terms found in datafile.
      To get results in the order of terms in srchterms, use
      awk -f thisawkfile srchterms > temp.bat
      Run the resulting temp.bat file.  
      Results will be in "results.txt"
      thisawkfile:
          BEGIN{
          #assuming  the datafile is "datafile"
          #and file of searchterms is "srchterms"
          datafile= "datafile"
          #following for dos, use 39 (') for unix
          q = sprintf("%c",34)
          print "del results.txt"
          #unix only:  print "touch results.txt"
          }
          #main
          {print "dg -k " q $0 q " " datafile " >> results.txt"
           #above commands are printed to temp.bat when
           #called as awk -f thisawkfile srchterms > temp.bat
          }

LIMITATION No Pipe capability TO dg:

The inability to accept input from a pipe can be annoying, but was a 
tradeoff for efficiency.  The program opens the input file twice using 
the first file tag to do the searching, with the second tag playing a 
"follower" role to print records that are "finds."  This approach 
avoids the need for large memory allocations, thus allowing unlimited 
record lengths.  Unfortunately stdin cannot be "opened twice," thus 
piping the output of other commands to dg has been sacrificed to avoid 
record size limitations. OUTput from dg can be redirected through a pipe.

Bugs:

Certainly.  Let me know what you find.

-- Pete Marikle
----------------------------------------------------------------------
*

