DCM / Diskcomm file format specifications revision 1.0

From: Michael Current (aa700@cleveland.Freenet.Edu)
Date: 02/07/98-03:55:14 PM Z


From: aa700@cleveland.Freenet.Edu (Michael Current)
Subject: DCM / Diskcomm file format specifications revision 1.0
Date: Sat Feb  7 15:55:14 1998


From: ernest@wxs.nl (Ernest R. Schreurs)
Date: Thu, 05 Feb 1998 22:50:58 GMT

Here are the specifications for the Diskcomm file format.
Sorry that this post is rather long, but over time, lots of people
have asked for it, so I hope it will be appreciated anyway.
Please send me some feedback if you think this text is unclear.
Keep those XL's/XE's humming.


Diskcomm file format version 3.2, revision 1.0, February 1998.

Diskcomm archives represent the contents of diskettes used for the
Classic Atari computers.  Various pieces of information related to the
format and the contents of the diskette are stored in the archive
file.  To reduce the storage space requirements, compression
algorithms are applied to the data.  For some large archive files, it
may be necessary to split the archive into multiple files, in order to
be able to store the archive on diskettes.

On an Atari disk, data is organized in sectors.  These sectors are
numbered starting from 1.  There are various disk sizes.  The most
common ones are the standard diskettes.  Common diskette formats are
the single density diskette, which holds 720 sectors of 128 bytes, the
enhanced density diskette, which holds 1040 sectors of 128 bytes, and
the double density diskette, which holds 720 sectors of 256 bytes.
There are various other formats, but the single density and the
enhanced density are used most, since these are supported by the 1050
disk drive.  Other formats require a XF551, 815, or some third party
disk drive, like the Percom, Indus GT, Trak, Black Box with floppy
board, MIO, and the HDI, to name just a few.  Diskcomm will always use
1040 sectors for enhanced density diskettes, so for this type of
format, the number of sectors processed is defined by the format.  For
single density and double density disks, the maximum number of sectors
can be modified between 1 and 9999.  By definition, the first three
sectors of any Atari disk contain 128 bytes, since this is considered
the boot area.  So the first three sectors of double density disks
will also contain only 128 bytes of data.  Diskcomm still stores these
sectors as sectors of 256 bytes within the archive, and the remaining
128 bytes will simply contain zeroes.

The sectors of a disk are stored in the archive sequentially.  Sectors
of data within a Diskcomm archive are compressed.  While creating the
archive, Diskcomm examines the contents of each sector that is
processed.  Based on these contents, one of several compression
algorithms is used to reduce the amount of storage required for
representing the contents of this sector.  Sectors that contain
nothing but zeroes are considered empty sectors.  Empty sectors are
not stored in the archive.  A flag will be set in the information
stored for the preceding non-empty sector, to indicate this.  This
preceding sector will be followed by the sector number of the next
sector that contains data.  It is assumed that the diskette will be
formatted before writing the archive back to a diskette, and thus that
initially all sectors on the output disk will contain zeroes.
Therefore, there is no need to store empty sectors in the archive.  To
be able to skip these sectors when writing the archive back to disk,
the sector number included in the archive is used to skip these empty
sectors.

For sectors that contain data, the contents of the sector are compared
to the contents of the last preceding sector containing data.  Empty
sectors have no influence on this comparison, since they are skipped.
Like noted before, there is a flag that indicates that a sector number
follows the sector data.  If this flag indicates that a sector number
follows the sector data, the number of the current sector is appended
to the archive buffer, in the 6502 low/high byte format.  Then an
attempt is made to compress the current sector.  There are four
different compression algorithms that can be applied.  Each of them is
applied to the sector in turn, and if the result is successful, the
resulting compressed data is appended to the archive buffer, with the
type of compression pre-pended.  Older versions of Diskcomm used a
fifth algorithm.  This is now obsoleted by one of the remaining four
algorithms, so this old algorithm is no longer applied when an archive
is being created.  However, some very old archive may still contain a
sector that was compressed by this algorithm.  If the sector data
cannot be compressed by one of the four algorithms, the data is stored
uncompressed.

Compression of sectors continues until memory runs out, or until there
are no more sectors left to process.  Due to memory limitations, there
is a maximum of just over 24K of data that can be stored in the
archive buffer.  When appending the compressed data to the buffer
causes the buffer to contain about 24K of compressed data, the buffer
is full, and it is flushed to disk.  A system that has more than 64K
of memory can hold multiple buffers before the data is actually
flushed to disk.  Each buffer load is considered to be a pass in the
compression of the disk.  A pass is an undefined number of compressed
sectors, that is considered complete when hex 5F02 ( dec 24322 ) bytes
of data or more has been accumulated.  The end of pass information is
then appended to the pass.  A pass must be no larger than hex 6002
bytes.  Each pass starts with the header, which consists of two bytes.
The first byte is either hex FA or hex F9.  When the archive is split
up into multiple files, this byte will contain hex F9, otherwise it
will contain hex FA.  The second byte of the header combines three
pieces of information.  The format of the original disk is indicated
in bit 5 and bit 6 of the second byte.  Bit value 00 is used for
single density disks, bit value 01 is used for enhanced density disks,
and bit value 10 is used for double density disks.  Bit value 11 is
undefined.  Bits 0 to 4 are used to indicate the pass number.  Each
pass is numbered sequentially, starting at 1.  since there are 5 bits
available for this, the highest possible pass number is 31.
Therefore, the largest archive will be no larger than 31 times 24K,
unless the pass count is allowed to roll over to zero.  The high order
bit of the second byte (bit 7) is set when this pass is the last pass.
Since compression is started before asking what the user wants to do,
the question of dividing the archive into smaller files is only
presented to the user if there is more than one pass.  If all data can
be stored in one pass, this question is not presented, and an archive
with header type hex FA is created.  The first sector within a pass
will always be preceded by its sector number.

Format description, values are in hex:

<Diskcomm archive> = {pass}
<pass> = <archive type> <pass information> <sector number> {sector
data} <end of pass>
<pass information> = <last pass flag> + <diskette type> + <pass
number>
<sector data> = <content type> [compressed data] [sector number]
<content type> = <sequential flag> + <compression type>
<archive type> = F9 | FA
<last pass flag> = 00 | 80
<diskette type> = 00 | 20 | 40
<end of pass> = 45
<sequential flag> = 00 | 80
<compression type> = 41 | 42 | 43 | 44 | 46 | 47
<sector number> = 0001 - 270F
<pass number> = 01 - 1F
<compressed data> = Sector contents, see below.

Format description in plain English.

Diskcomm archive: A Diskcomm archive consists of one or more passes.
When an archive is split into multiple files, each pass is stored in a
separate file.
Pass: A pass consists of an archive type code, followed by pass
information, followed by the starting sector number, followed by one
or more sector data packets, followed by the end of pass code.
Archive type: The archive type indicates whether this is a multi file
archive (F9) or not (FA).
Sector data: A sector data packet consists of one byte that indicates
the content type for the sector data packet.  After the content type,
the compressed data for the sector follows.  The contents of this
depends on the type of compression, and it can contain any number of
bytes, from zero up to the length of the sector for the type of disk,
either 128 or 256 bytes.  The high order bit of the content type is
used to indicate whether or not a sector number will follow the
compressed data.  If this bit is zero, a sector number will follow the
data.  If this bit is one, there will not be a sector number following
the compressed data.
Sequential flag: This flag indicates whether or not the sector packet
contains a sector number.  If this flag has the value 00, a sector
number will follow the sector data.  If it has the value 80, there
will not be a sector number following the sector data, and the next
sector is the next sequential sector.
Content type: The high order bit of this byte is the sequential flag.
The remaining low order bits are the compression type.
Sector number: An unsigned sector number, which is two bytes.  The
first byte is the low order portion of the number, the second byte is
the high order portion of the number.  Normally ranging from 1 to 9999
decimal.
Pass number: A sequence number assigned to each pass.  Normally
ranging from 1 to 31 decimal.  This might roll over to zero after 31.
End of pass: The value hex 45.
Compression type: One of the following hex values: 41, 42, 43, 44, 46
or 47.  The meaning of these values is described below.

Type 41, modify begin.

The compression is relative to the previous sector.  The sector data
contains only the beginning portion.  The last portion is not changed.
The first byte of the sector data specifies at what offset to start
modifying the sector.  The remaining bytes of the sector data are used
to modify the beginning portion of the sector.  This modification
takes place starting at the byte at the start offset, working towards
the beginning of the sector, up to and including the byte at offset
zero, the first byte of the sector.  This implies that the data bytes
are stored in a reverse order in the sector data.

Type 42, 128 byte DOS sector.

This is an obsolete compression type, that was used by early versions
of Diskcomm.  Earlier versions of Diskcomm supported only single
density diskettes, so this type of sector always represents 128 bytes.
Programs that decode archives should be aware of this.  Using it for
creating new archives is not recommended.  The sector data contains
five bytes.  The first byte of the sector data is used to initialize
the first 124 bytes of the sector.  The remaining four bytes are
stored in the last four bytes of the sector.

Type 43, compressed sector.

The sector data contains substrings.  These substrings alternate
between uncompressed and compressed, starting with an uncompressed
substring.  Each of these substrings starts with a byte that specifies
the ending offset of the resulting data in the sector.  When this
ending offset position is reached, the end of the substring is
reached, and the byte at this ending offset is the starting position
for the next substring.  The starting position for the first substring
is at offset zero.  An uncompressed substring will contain as many
bytes as are needed to fill the sector from the start position up to,
but not including the end offset.  For uncompressed substrings, if the
starting position offset is equal to the ending offset, there is no
further data, so in effect, this is a null string.  This is used when
there are two portions of data within the sector that can be
compressed, without other data in between these portions.  The
uncompressed substring must be present, therefore a null string must
be used in this case.  Compressed substrings are always two bytes in
length.  The compressed substring starts with a byte that indicates
the ending offset.  The second byte contains the fill character.  The
portion of the sector starting at the start offset, up to, but not
including the ending offset, is set to the value of this fill
character.  After the compressed substring, another uncompressed
substring follows.

For double density disks, the ending offset for the last substring is
256.  Since there is only one byte to represent the ending offset,
this is stored as zero.  However, zero is an offset that can be used
for the first uncompressed string, to indicate that the first
uncompressed string is a null string.  The end of this type of
compressed sector is reached when all bytes in the sector have been
processed.  This can occur at the end of an uncompressed substring.
In this case, there will not be a compressed substring following the
uncompressed string.  Likewise, if it occurs at the end of a
compressed substring, there will not be an uncompressed string
following it.

Type 44, modify end.

The compression is relative to the previous sector.  The sector data
contains only the ending portion.  The beginning portion is not
changed.  The first byte of the sector data specifies at what offset
to start modifying the sector.  The remaining bytes of the sector data
are used to modify the ending portion of the sector.  This
modification takes place starting at the byte at the start offset, up
to, and including the last byte of the sector.

Type 45, end of pass.

This compression type indicates the end of a pass, so it is not a real
compression type.  There is no sector data for this type.  For a multi
file archive, this indicates the end of the file.  The archive is
continued in the next file, unless this pass was the last pass.  For
single file archives, this indicates that the next pass follows within
this file, unless this was the last pass.  The next pass starts with a
header again, followed by a sector number.

Type 46, same as before.

This compression type indicates that the data for this sector is
identical to the data of the previous non-zero sector.  There is no
sector data for this type.

Type 47, uncompressed sector.

The sector data contains the number of bytes required to fill an
entire sector, either 128 or 256 bytes.  No compression of any kind is
performed on this sector type.

Previous sector.

The buffer that holds the contents of the previous non-zero sector is
initialized at the start of a pass if the archive is a multi file
archive.  For single file archives, this buffer is cleared at the
start of the first pass only.

Known bugs and anomalies.

This specification induces some anomalies.  When the last sector in a
pass has the flag set that indicates that a sector number must follow
it, this sector number has no meaning, since the next pass will always
start out with a sector number.  Diskcomm might not have the next
sector available.  Therefore, it cannot always determine whether or
not it is an empty sector.  Since a sector number must be included
once we set this flag, a fake sector number is appended.  The value
hex 0045 is used for this.  This is also true for the last pass.  Note
that this is stored in the low/high byte order.

Diskcomm processes sectors in chunks of 18 sectors, or chunks of 9
sectors if the disk is double density.  These chunks might include
empty sectors.  The last sector in these chunks will always be
followed by a sector number, since Diskcomm does not read ahead to
determine the contents of the next sector.  This is not a requirement.
On creating an archive, Diskcomm just happens to do this.  So a sector
number might be included even if the next sector is non-zero.

It looks like Diskcomm has some slight problems.  Double density
sectors are 256 bytes long.  If the buffer contains hex 5EFF bytes,
and the sector cannot be compressed, and a sector number must be
included, we must add 259 bytes to the buffer.  To mark the end of
pass, we have to add either one hex 45 byte, or hex 45 00 45.  This
might add up to three extra bytes.  The pass would be hex 6003 bytes
long.  This makes the pass longer than hex 6002 bytes.  On reading,
this is also a problem.  Diskcomm will not store the first two bytes,
since the two header bytes are read and processed first.  Then it
tries to read hex 6000 bytes.  Within these hex 6000 bytes, the end of
pass code must be included.  This will be missing though, so Diskcomm
will not be able to process the file.  This problem only occurs with
double density disks in the specified exceptional conditions.

When a pass contains exactly hex 6002 bytes, Diskcomm will terminate
processing after this pass.  Therefore, passes should be less than hex
6002 in length.  This can only occur with archives of double density
disks.

For unknown reasons, the passes above pass number 31 have their pass
number reduced by one.  Only the five low order bits are stored.

For multi-file archives, a selected character of the filename is
incremented for each pass.  This will eventually cause an invalid
character to be used in the filename, depending on the restrictions
imposed by the DOS used.

Send comments to:

Ernest R. Schreurs
Kempenlandstraat 8
5211 VN  Den Bosch
The Netherlands
ernest@wxs.nl
-- 
Michael Current, mailto:mcurrent@carleton.edu
8-bit Atari FAQ and Vendor Lists, http://www.faqs.org/faqs/atari-8-bit/
Cleveland Free-Net Atari SIG, telnet://freenet-in-c.cwru.edu (go atari)
St. Paul Atari Computer Enthusiasts, http://www.library.carleton.edu/space/


-----------------------------------------
Return to message index