Documentation to 4DECOMP Version 1.01, (c) 1999 by Akisoft, Vienna

1. Purpose of 4DECOMP:
~~~~~~~~~~~~~~~~~~~~~~
4DECOMP is a small command-line program supplied in binary (4DECOMP.EXE)
and source (4DECOMP.PAS) form which decompresses 4DOS' BATCOMP compressed
.BTM files. Version 1.01 supports both 4DOS 5.0 and 6.02 compression
algorithm.

2. Installation information:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unpack the archive anywhere you want and run 4DECOMP to get a quick help how
to set parameters. You may copy 4DECOMP.EXE to your C:\4DOS directory to
reflect its status of a 4DOS tool... :-) If unsure about trojans etc., grab
yourself a Turbo Pascal 5.x or above compiler and recompile the .PAS file.

3. Status of the program:
~~~~~~~~~~~~~~~~~~~~~~~~~
The program is Public Domain. You can do whatever you want with the program
except for faking changes to the archive (to let others think I did things
that I didn't). You can alter the source or the binary, but you have to keep
a notice printed at any execution of the program showing that I am the
author (at least of the main idea and the used decryption algorithm).

4. Distribution status:
~~~~~~~~~~~~~~~~~~~~~~~
Freely distributable.

5. How to contact the author:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthias Waechter
Student at the Technical University of Vienna, Austria, Europe
<e9125812@student.tuwien.ac.at>

A. Crypto information:
~~~~~~~~~~~~~~~~~~~~~~
I don't think that the compression used by 4DOS's BATCOMP and thus by
4DECOMP has a lot to do with "Crypto". So U.S. export restrictions should
not apply to both of them.

Now the real stuff...
~~~~~~~~~~~~~~~~~~~~~
Since its version 5.0, 4DOS from JP software contains a tool named BATCOMP
which is intended to compress .BTM (batch-to-memory, i.e. non-self-modifying
batch files) to use less space on the hard disk and to make the file
unreadable to those running the batch files. The standard DOS- and Win9x-
command.com supports batch files (.BAT extension) which are slow, unflexible
and in fact useless, only small "ECHO hello world!" type of programs make
sense with these command interpreters. 4DOS presents an alternative command
processor introducing a lot of additional features. 4DOS is available from
JP software directly (http://www.jpsoft.com/) and from a lot of shareware
sites around the world, f.e. simtel.net (http://www.simtel.net/) as 4DECOMP
is. Please, always watch out for your nearest mirror site. Remember, 4DOS
is shareware, 4DECOMP is public domain and thus freeware.

4DOS (and its derivates 4OS2, 4NT and Take Command) gives batch files a new
kind of sense: One now can use subroutines, functions and so on to make
batch files more useful. Take a look at your 4DOS distribution for examples
on how to use all those features (you can also use 4DECOMP to decompress the
4DOS installation script _4INST.BTX). Anyway, since batch files can be used
to do really complex work with 4DOS, on the one hand they grow in size, on
the other hand the author gets interested in the distribution of the
functionality of the batch file only, not of its source code. JP software
solved both problems with one big hit: BATCOMP. The tool which in fact
should be named "BTMCOMP" compresses text files of up to 64k size to meet
both requirements: The files shrink in size and they become unreadable to
the person executing or distributing it.

The problem is that the decryption should be fast (i.e. easy) for 4DOS
because decryption has to be performed at any start of the program. So
JP software used a compression algorithm which on the one hand does a
compression not comparable to f.e. the ZIP format and on the other hand is
easy to decompress. Asking myself how "unreadable" the compressed files
are, I tried to compress a few text files with BATCOMP. It didn't take long
for me to find out how the compression works, so I wrote a decompressor.

The echo on this program (4DECOMP 1.0) was enormous: A lot of people were
desperately seeking such a tool, because they compressed a batch file,
removed the readable source code and found themselves lost without any
possibility to fix things or to reuse their code. I think, in some cases one
can contact JP software and ask them to decompress a file, but since 4DECOMP
is out, anyone can uncompress those compressed batch files.

Of course, 4DECOMP weakens the feature set of BATCOMP, because the
unreadability is lost now. But I think it is better that everyone has the
chance of decompression of compressed .BTM files than just a few hackers
have who do it by stepping through 4DOS.COM or writing their own 4DECOMP
which they don't make publicly available. I think it is better for a user of
BATCOMP to know that there is something like 4DECOMP accessible for everyone
than just rely on that none of the hackers having such tool will ever see
his secret code.

5 years and some have gone since 4DOS 5.0 was born by JP software. A lot of
changes were done to 4DOS, but until recently, the "encryption" scheme
introduced by the BATCOMP tool remained the same. Of course, as always, the
dilemma is to be "upwards compatible" and "downwards compatible" at the same
time, which in fact meant, keeping everything as it is. So even after the
outcome of 4DECOMP 1.0 in Dec 1993, JP software didn't change the compression
algorithm.

Version 6.02 of 4DOS which came up recently broke with this dilemma by simply
presenting a "new" compression algorithm. Well, it didn't take long for me to
find myself asked whether there is any chance to get a new 4DECOMP since the
old version denied to uncompress files created with the new BATCOMP utility
enclosed in 4DOS 6.02. OK, friends, here it is: Ta-tadaaaa: 4DECOMP 1.01! :-)

Of course, I could have called it 2.0 or so since it's 5 years actually after
the last version of the program... but the changes to the compression scheme
and thus to the code of 4DECOMP are so small, I thought about calling it
"4DECOMP 1.00.01" or so, but 1.01 sounds better, I think.

Well. Now to the changes to the compression system.

In fact, comparing the compressed files of the old and new algorithm, the new
one only produces larger files. To be precise, the files are _exactly_ 2 bytes
longer. You guess it: In general, the compression remained the same.

Well, beside these two bytes and changes in the file header, the "most
frequently used characters" have been encrypted to make it even more difficult
to see that the file layout remained the same. OK, let me try to explain it in
more detail (like in VER100.DOC for the old format).

File format of the compressed file:
===================================
(like everything in this package distributed without any warranty)

The file is split into three sections: The first section is the "header", the
second section is the "most frequently used characters table", and the third
section contains the compressed characters.

The "header" consists of 6 bytes, whereby the first two contain BEh EBh
(swapped in comparison to the old format), and the next two (new) bytes con-
tain something I currently not know of what it is, maybe it's something like
a field to specify the compression type for future enhancements. After these
new bytes currently with a value of "8C 00" we find two bytes specifying the
size of the original BATCH file (in bytes).

The encryption uses the following scheme: from position 7 there are 30
characters stored, which are the most frequently used characters. The rest of
the file contains nibbles (2 nibbles per byte).The first 14 of the most
frequently used characters have nibble-codes from 2 to 15 (2h to Fh), the
others have 2-nibble-codes (use the same size as one byte, but may be located
separated inside 2 bytes), the first nibble is always 1 and the second goes
from 0 to 15 (0h to Fh). To encrypt the translation table, the entries are
XORed against each other: The first must be XORed with the second to get the
real character needed for decompression, the second must be XORed with the
third, etc. Only the last character is _not_ encrypted in any way, so it is
cleartext (as all letters were in the old compression scheme).

The compression itself is in the last block: Every byte is split into two
nibbles which are processed high-before-low in sequence with no boundary. If
a nibble has a value of 2 or above, the appropriate value of the most
frequently used character table is used. If a nibble is "1", then the next
nibble decides which value from the 2nd part of the most frequently used
character table to use. If the nibble code is 0, it indicates to use the next
two nibbles to form a character not contained in the most frequently used
character table. The two character nibbles are swapped and then taken as one
character.

Summary: If there are only a few different chars inside the text, for example
by using a lot of ECHO-directives, the text can be compressed to one half of
its original size (1 char uses only 1 nibble of storage place). If there are a
lot of different characters in the text, and most of them appear often, it is
possible that the "compressed" file is larger than the original batch-file.


Let me repeat the example from the old version to show the differences between
the two formats:

Original .BTM-File:

ECHO OFF
echo abcdefghijklmnopqrstuvwxyz

Compressed .BTM-File (HEX-output of DEBUG):

0000  BE EB 8C 00 28 00 0A 2A-6C 23 40 25 4D 22 2F 6C   ....(..*l#@%M"/l
0010  29 61 00 62 06 60 07 6E-04 6F 03 6E 00 70 01 73   )a.b.`.n.o.n.p.s
0020  00 74 01 77 BA C4 24 33-96 57 82 DE 5F 61 01 17   .t.w..$3.W.._a..
0030  12 13 14 15 16 17 81 81-91 A1 B1 C1 D1 E1 F0 87   ................
0040  09 70 A7                                          .p.

*)      0000: BE EB, indicates compressed .BTM-file type >=Ver. 6.02
*)      0002: 8C 00, purpose unknown for now, maybe additional compression
              algorithm selection.
*)      0004: 28 00, size of original .BTM-file (0028h=40 bytes).
*) 0006-0013: The very most frequently used characters, XORed against
              the next. So, to get the _real_ first character of the
              translation table, XOR the value 0Ah and it's successor 2Ah
              to get 20h (=space). To get the 2nd entry, XOR 2Ah with 6Ch
              which gives 46h, and so on. After decryption 0dh stands for
              both 0dh and 0ah (carriage return+line feed). So, the very
              most frequently used characters are:
              Token:  2  3  4  5  6  7  8  9  A  B  C  D  E  F
              ASCII:  20 46 4F 63 65 68 6F 0D 43 45 48 61 62 64
              Char:   SP F  O  c  e  h  o  CR C  E  H  a  b  d
*) 0014-0023: The other most frequently used characters, continue decryption
              started at the previous table. The last character (77h="w")
              remains unencrypted. These chars are decrypted:
              Token:  10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
              ASCII:  66 67 69 6A 6B 6C 6D 6E 70 71 72 73 74 75 76 77
              Char:   f  g  i  j  k  l  m  n  p  q  r  s  t  u  v  w
*) 0024-0042: Token for the used characters in the original .BTM-file:
              BA = Token bh (Token 11 is "E"), Token ah (Token 10 is "C")
              C4 = Token for "H" and "O"
              24 = Token for " " and "O"
              33 = twice the Token for "F"
              96 = Token for NEW-LINE and "e"
              57 82 DE 5F = Token for "cho abcd"
              61 = Token for "e" and prefix for second table
              01 = Token 0 from second table = "f" + prefix for second table
              17 = Token 1 from second table = "g" + first Token 7 = "h"
              12 13 14 15 16 17 = all token from second table "ijklmn"
              81 = Token "o" + prefix, second table
              81 91 A1 B1 C1 D1 E1 = Token "pqrstuv" from second table
              F0 87 = Token 15 from second table "w" and character 78h="x"
              09 70 A7 = again Token for character, character "y", token, "z"

You see, the changes are really small (please excuse some errors in the old
VER100.DOC file concerning file positions, it should say: 0004-0011, 0012-
0021, 0022-0040.)
