English

Russian

SourceForge.net Logo

The documentation on FreeArc 0.36

Introduction. 1

The purposes of creation. 1

Installation on a computer. 2

Format of a command line. 3

The list of commands. 3

Search of files. 4

The list of options. 5

The task of options. 10

Details. 11

Inquiries to the user.. 11

Protection and restoration of data. 11

Differences from RAR. 12

Technical restrictions. 12

Adjustment of solid-compression. 13

Sorting of files for solid-compression. 14

Updating of solid-archives. 15

Adjustment of compression. 15

Types of files. 17

Decoding of algorithm of compression. 17

Adjustment of algorithms of compression. 19

Block algorithms.. 23

Use of memory. 23

Config-file arc.ini 24

Section [Compression methods]. 25

The information for developers. 26

Addition in the program of new algorithms of compression. 26


Introduction

The purposes of creation

Many archivers existing now use the unique algorithm for realization of all range of modes of compression – from the fastest up to the most dense. It leads to that speed of work of the archiver in these modes differs all in 5-6 times. Thus "extreme" modes all equally remain ineffective, not giving the substantial improvement on speed/compression compared to default mode. Moreover, one compression algorithm just can’t suit equally well for various types of data. As a result advanced users should hold near at hand the whole set of archivers, choosing the most suitable from them depending on a each concrete task. Add to it that the majority of archivers have long-term history, and because of requirements of backward compatibility algorithms of compression used in them are far from ideal.

Therefore I have set for myself a goal to create the archiver of new generation having the big range of speeds of compression for solving different tasks – from 100 kb/sec up to 10 Mb/sec, and squeezing out on every speed maximum compression possible! In order to reach this goal, FreeArc uses three best modern compression libraries: LZMA by Igor Pavlov, PPMD by Dmitry Shkarin and GRZipLib by Ilya Grebnov. Each of them has their own strengths – GRZipLib quickly packs, PPMD well compresses texts, LZMA well packs binary files and has very fast unpacking. At packing FreeArc groups files by data type (text, binary, multimedia, etc.) and chooses for each of these groups the most suitable algorithm of compression. As a result to the user it is provided:

·        a choice of the best compression algorithm which provides required compression speed

·        a choice of algorithm of the compression most suitable to concrete data being packed

And, notice – all it is done inside of the one universal archiver suitable (I hope J) for all occasions.

 

The second goal was to create an archiver that is:

·        reliable

·        easily portable to other OSes

·        easily expanded by new features (restoration of data, generations of files, multilanguor, GUI)

·        capoable to process large amounts of data fast and reliable

For achievement of this purpose the archiver is implemented in of ultra-highlevel language Haskell - portable, programming errors-prone and allowing easily to describe the most complex algorithms.

 

Third my purpose was creation of the sample of the modern archiver, which source codes any interested person can easily study(investigate) and adapt for the purposes. In particular, now there are many fine libraries of compression which do not receive a wide circulation only because their authors are not going to write high-grade archivers. Having fastened the library to the available archiver, it is possible to solve this problem easily. 

Installation on a computer

For installation FreeArc on a computer copy files Arc.exe, Arc.groups and Arc.sfx to the directory included in PATH. Optionally you can copy there a file arc.ini, containing my own adjustments.

To add to FAR support of FreeArc archives, it is necessary to find directory where FAR plugin MultiArc stores its format files (... \Program Files \Far\Plugins\MultiArc\Formats) and to copy FreeArc.fmt there.


Format of a command line

Work with FreeArc is made as much as possible similar to work with console versions of RAR and 7-Zip. FreeArc has the same format of a command line, uses names of commands and options compatible to them (the preference is given to RAR as more popular), and in general "behave" the same as you have got used to expect from these programs. Therefore the most FreeArc features is stated by me of briefly, detailed descriptions are given only to features unique to FreeArc.

 

The format of a command line is far from unique J:

Arc Command Archive [Filenames...] [@Filelists...] [Options...]

Names of files can be set by masks with use of the standard designations “?” and “*”. Names/masks of processable files can be read also from filelists where they should be written down line-by-line. If filenames/listfiles are not specified, then “*” is used by default. Options can be specified at any place of a command line. The format of a command line for execution of one command has been given above. It is possible to execute at once several commands (in turn), having separated their descriptions by “;” with blanks, for example:

Arc a ../archive2 -m2 -r –t ; a ../archive3 -m3 –r –t ; a ../archive4 -m4 –r –t

Options are by default read out from a file arc.ini, being in one directory with the program, and from FREEARC environment variable. Their processing can be switched off by an option –-cfg–.

The list of commands

a

To add files to archive. Files with the same names, already existing in archive, will be rewritten

c

To set the archive comment. It is equivalent to a command “y -z

create

To create new archive. Deletes archive with the given name if it already exists, and then operates as a command “a

cw

To save archive comment to a file: Arc cw archive outfile.txt

d

To remove files from archive

e

To take files from archive to the current directory (or the directory specified by an option dp)

f

To update files in archive. It is equivalent to a command “a -f

j

To join archives. The files specified in a command line, should be archives. Their contents will be added to the basic archive. If in the basic archive and one of added there are files with identical names in the resulting archive there will be only a file from added archive. «Basic archive» can not exist prior to the beginning of execution of a command, in this case existing archives will be joined to create new one. Examples:

Arc j new.arc old1.arc old2.arc

Arc j new old [34]

Arc j All_Together c:\* -r

k

To protect archive from the further changes. It is equivalent to a command “y -k

l

To print the list of files in archive

m

To move files and directories to archive. It is equivalent to a command “a -d

mf

To move files to archive. It is equivalent to a command “a -df

r

To restore the damaged(injured) archive, using recovery record

rr [NN]

To add recovery record to archive. It is equivalent to a command “y –rr[NN]

t

To test files in archive, unpacking them in memory and checking CRC

u

To update files in archive and to add new files. It is equivalent to a command “a -u

v

To make the technical archive listing intended mainly for archiver shells

x

To take files from archive with full directories

y

To copy archive, making changes in it. For example, these changes can be addition of comments to archive, its protecting from the further changes, etc. If in a command line names of files these files are copied only are specified. The files specified in an option “-x”, are excluded from archive (it is possible to tell, that the command “y” leaves in archive just the same files which the command “d” would remove J). Unlike RAR, it is possible to use this command to repack archive with new options, for example Arc y archive -m5 --recompress

Search of files

All commands can be divided into two large groups: for commands that creates, updates and joins archives commandline lists the archived files which are being on the DISK; for all other commands – removal of files in archive, change of archive, unpacking, testing and listing files in archive – commandline list names of files that are inside ARCHIVE.

 

Let's examine an example of a command of archiving: “Arc a backup makefile *.cpp *.h @projectfiles”. At its execution, first, all references to filelists (in this case it is projectfiles) are replaced with their contents. Filelists can contain names and the masks of the files which have been written down line-by-line, but cannot contain options. Further, on a disk the files matching each specified name or a mask are searched. If the option -r is used, the files matching each name/mask, are searched not only in the current directory, but also in its subdirectories. If the specification of a file includes a name of the directory files are searched in this directory and all its subdirectories, for example the command “Arc a sources rtl\*.c -r “ will pack all files with extension “cin directory RTL and its subdirectories. If in a command line names of processable files are not specified, for example “Arc a archive, then the mask “*” is automatically used, i.e. archive all files in the current directory (and its subdirectories if the option “-r” is specified).

The name of archive also can be set by a mask, thus archive files satisfying to a mask in the current (or specified) directory are searched. F.e., the command “Arc a c:\archives\* great.nfo“ will add to all archives which are being the directory c:\archives, a file great.nfo. If the name of archive, specified in a command line, does not contain “.”, then standard extension “.arc” is automatically added to it; the only way to prevent is to add an option --noarcext.

The command “j” in all is similar to commands of archiving, only in a command line names of archives are specified, therefore at absence in these names of .standard extension “.arcalso added to them. In particular, to merge all archives on disk C in one a command “Arc j All_Together c:\* -r may be used.

If option –dp (base directory on disk) is specified then files to archive are searched relational to this directory – as it was the current one.

  

For all other commands the names specified in a command line, designate files in archive. References to filelists (@listfile) are in the same way replaced with their contents. If the name is set without the indication of the directory it matches to files in any directories, for example *.txt matches both readme.txt and src\history.txt. If the name is set with the directory it matches only to files in this directory – files in any other directories, including its subdirectories, do not suit. At use of an option --fullnames names without directories matches only to files in the root directory, i.e. *.txt will not match any more a file src\history.txt.

The name of processable archive can be set by a mask, and if it doesn’t contain an extension, then .arc will be automatically added. The option “-r” in these commands means recursive search in subdirectories of ARCHIVES TO PROCESS. So, the command “Arc t c:\* -r” will test all archives on disk C, and the command “Arc d c:\* *.bak -r” – will remove from them files “*.bak”.

The option –dp in these commands means the directory on a disk which will be considered as current one for extraction of files from archive.

 

For all commands by means of an option “-x” it is possible to exclude a part from the found files: so, the option “-xcommon.h” excludes files common.h, “-x*.bak” – all files with extension “bak”, and “-x@exclude.lst” – all files which specifications are listed in a file exclude.lst. Thus, if the name/mask of a file does not include a symbol “/” or “\”, it is compared with so-called «base name» of file (i.e. a name without the directory), otherwise – to a full name of a file. For excluding the whole directory use "-xdir/*".

The option –ap sets the base directory inside of archive – i.e. at archiving it is added to names of packed files, and at unpacking removed from names and the files which are not belong to this directory, are completely ignored.

In masks it is possible to use designations “?” (designates any symbol), "*" (designates any sequence of symbols, including empty). In names of directories masks are not supported.

Archives processable by a command and temporary files of the archiver are automatically excluded from processing – i.e. there is no danger, that they will be added to archive, or overwritten by the files taken from archive.

If after operation of updating of archive in it there was no file (for example, all files have been removed, or no files has been found for archiving, or none of archived files did manage to be successfully open) the created archive will be deleted.

The list of options

Each option has the long name, the majority of options – also short one.

-ac

--ClearArchiveBit

To reset Archive attribute of files on disk after successful packing/unpacking

-ad

--adddir

To add a name of archive being unpacked (without extension) to a name of the directory where there will be a unpacking. For example, the command “Arc x * -ad ” will unpack all archives in the current directory to subdirectories with the same names. When used simultaneously with an option -dp the name of archive is added at the very end

-ag

--autogenerate

Automatic generation of a name of archive. To a name specified in a command line, the information on current date and time is added. For example, “backup.arc”-> “backup20050302114328.arc”. It is also possible to explicitly specify the string added to archive name (it uses the format of C function strftime())

-ao

--SelectArchiveBit

To choose at archiving only files with enabled attribute Archive (does not work at unpacking, in particular because attributes of files inside of archive are not kept yet)

-ap

--arcpath

The base directory inside of archive

-cfg-

--config-

To switch off reading default options from a file arc.ini and FREEARC environment variable

-d

--delete

To remove successfully archived files and directories after packing

-df

--delfiles

To remove successfully archived files after packing (but to not delete directories)

--display

Controls amount of the information on process of the compression, printed by the program. This option is set by a character set, for example --display=rts where each symbol enables printing of the certain portion of the information:

·         a – a name of created archive

·         c – a method of compression

·         m – amount of used memory

·         d – the final information on the directory of archive (it is printed only if the directory > 10 kb)

·         r – the final information on a degree of compression

·         t - the final information on speed of compression

·         s summary of summaries:) for the commands which have processed more than one archive

Settingadrts” is used by default, the option –display without parameter enables printing of all available information. Pay attention, that to logfile in any case everything is written

-dm

--dirmethod

Method of packing of the directory of archive and other service blocks. All opportunities are similar to an option -m. By default -dm=lzma:512k. If you do not want, that the directory of archive has been packed – use an option -dm0. Please note that FAR MultiArc plugin can work only with archives which directories are packed by methods lzma and storing (as other methods do not give any advantages for compression of directories, their support has not been included at compiling of a plugin).

-dp

--diskpath

The base directory on a disk. For example, the command “Arc x archive -apdir1 -dpdir2” will extract all files from the directory dir1 inside of archive to the directory dir2 on a disk

-ds

--definesort

To define the order of sorting of files at archiving or to switch off sorting at all. Look details in section Sorting of files for solid-compression Error! Bookmark not defined..

-ed

--nodirs

To not add empty directories in archive

-f

--freshen

To update existing files (at packing or unpacking)

-fn

--fullnames

In commands d, e, x, t, l, v the names of files specified in a command line, concern to files inside of archive. Thus there is a question – whether they designate a full name of a file or only a name without a name of the directory? The answer: if the name includes “/” or “\” or this option is included – that it is treated as a full name

-i

--indicator

Choice of the indicator of progress:

·        -i0 – switches off a printing of the indicator, including in heading of a window

·         -i1 – shows only total progress indicator

·        -i2 – in addition prints a name of each processed file similarly how it is made in the majority of archivers

-k

--lock

To protect archive being created from the further changes

-kb

--keepbroken

At unpacking - to not delete files at which unpacking errors have been found out. You can try to restore data from these files manually.

At packing – to not delete the created temporary file of archive even if in it errors are found out

 -lc

--LimitCompMem

To limit the amount of memory used by algorithm of packing. By default the program uses no more than 75% of physical memory of a computer, but by means of this option you can cancel restriction or change it see Use of memory

 -ld

--LimitDecompMem

To limit amount of memory which will be necessary for unpacking created archive. Pay attention – this option influences only the commands which are creating/modifying archives! See Use of memory

-m

--method

Method of compression. Options -m1..-m9 establish various modes of packing – from fastest using only 16 Mb of memory up to the slowest using 4 gb both at packing and unpacking. Options -m1x..-m9x setups asymmetric compression algorithm (LZMA) which features very fast, memory-cheap decompression but provides slower and less tight comrpression. In a mode -m3 the program compresses at level of RAR, only 2-3 times faster, in a mode by default (-m4) – at a level of 7-Zip and UHARC, only 2-3 times faster again. Other details see in section Adjustment of compression Error! Bookmark not defined.. Also the mode -m0 (switching off the compression) is traditionally supported.

-md

--dictionary

To set the size of the block/dictionary for algorithms of compression. Adjusts algorithm set by the m option

-ms

--StoreCompressed

To not try to compress already packed files. Are files already packed – is determined on their names so mistakes here are possible and consequently this option is not enabled by default. Details look in section Types of files.

-o

--overwrite

Mode of rewriting of files at unpacking:

·        –op  – to make inquiries about rewriting (default setting)

·        –o+overwrite all files without inquiries

·        –o– don’t overwrite any file without inquiries

Details see in section Inquiries to the user.

-pt

--pretest

Mode of archive testing before executing operation on it:

·        –pt0to test nothing (also it is set as –pt–)

·        –pt1to check recovery record, at its absence –  to do nothing (default setting)

·        –pt2to check recovery record, at its absence – to test archive (also it is set as –pt and –pt +)

·        –pt3to check recovery record and then to test archive

-r

--recursive

Recursive search of the specified files in the specified (or current) directory and its subdirectories

-rr

--recovery

To add to archive recovery record of the specified size. It is set in the form of:

·         –rrSIZEto add recovery record of size specified in bytes, for example –rr1mb

·         –rrN% or –rrNpto add recovery record of size specified in percentage of size of archive, for example –rr2%

·         –rr– to disable the further addition of recovery record

The option –rr without parameter sets use recovery record with default size: 4% for archives up to 500 kb, 2% for archives up to 2 Mb, 1% in other cases (if an archive already contains recovery record its former volume will be used). Use of recovery record allows to restore consecutive damage of archive if it does not exceed the size of recovery record; in case of non-sequent damages the probability of successful restoration decreases.

-s

--solid

Controls the size of the solid-block and splitting into parts of the directory of archive. By default the size of the solid-block makes 1 gb, and the directory is segmented, describing on 20 thousand files. The option s16m will change the size of the solid-block to 16 Mb, s100f will create solid-blocks on 100 files, and se will create the separate solid-block for each extension of files. The detailed description of this option see in section Adjustment of solid-compression Error! Bookmark not defined..

-t

--test

To test the created archive after packing. The further actions, including removal of archived files, are carried out only at successful result of testing. The archive which has appeared broken is erased unless the –kb option is set

-tk

--keeptime

To keep that modification time of archive which it had before operation

-tl

--timetolast

To make modification time of archive equal to the mod. time of youngest file in archive

-u

--update

To update existing files and to add new ones (at packing or unpacking)

-w

--workdir

To set the directory for temporary files. At updating archive the new version of archive is created in this directory, upon termination of packing the old archive is removed and then replaced by this one. If options –t and –w are used simultaneously testing of the created archive is made twice – before it is moved from temporary directory and after that

-x

--exclude

To exclude from processing the given files or files from given filelists, for example Arc a backup r -xcommon.h -x*.bak -x@exclude.lst -xdir/*. Details see in section Search of files.

-y

--yes

Automatically answer "yes" to all user inquiries (it is convenient for unattended operation). See  Inquiries to the user

-z

--arccmt

To add the comment to archive:

·         (by default) – the comment is copied from initial archive

·         –z– – the comment removed at updating archive

·         –znew comment is entered from stdin

·         –zFILENAMEnew comment is read from the given file

The archive comment is printed at all operations with archive. Especially it has great value for self-unpacking archives as it is printed before its unpacking and allows the user to decide whether it is necessary for him to unpack the given package

 

--append

To add new files in the end of archive, without recompressing existing solid-blocks. In particular, thus it is possible to create archive in parts, for example using different options on each step:

arc a sfiction -r -s16m -m5b *.doc

arc a sfiction -r -s1m -m4t -x*.doc --append

See Updating of solid-archives

 

--cache

To set the size of read-ahead cache. Caching accelerates archiving of big number of small files when the most part of an operating time of the archiver spend not on compression itself but on reading all these files from a disk – acceleration can be up to 30-50%! See details in section Use of memory

 

--crconly

To not write down to archive the packed data, but to calculate and remember CRC of archived files. This option can be used to check up later – whether files have changed

 

--debug

To enable displaying a debugging info. This option is intended for internal testing the program

 

--dirs

To add empty directories to archive. Without this option, FreeArc adds empty directories only in that case when all files are added, i.e. the mask “*” is explicitly or implicitly specified. This option is opposite to –ed (--nodirs)

 

--groups

Name of the file setting the order of sorting of files for improvement of compression and describing types of files. By default it arc.groups from that directory where there is a program. Details about its format look in sections Sorting of files for solid-compression  Error! Bookmark not defined., Types of files.

 

--logfile

To specify a logile name where the information on all carried out operations and error messages will be written. This option is convenient for setting up in arc.ini or FREEARC environment variable; an example: --logfile=c:\temp\freearc.log

 

--noarcext

If the name/mask of archive is specified without extension then extension “.arc” is automatically added. There is only way to prevent this – to use an option --noarcext J. This also apllies to names of the archives joined by a command “j”. This option is intended mainly for configuring archiver shells.

 

--nodata

To not write down to archive the packed data, having left only the directory of archive. This option can be used for cataloguing files instead of their packing. Moreover, it is possible to transform usual archive into the directory of files by command Arc y archive --nodata --recompress

 

--nodir

To not write down to archive the directory of archived files. This option is interesting only for testing archivers – to merge set of small files into one big file

 

--print-config

Print definitions of compression methods. See details in section the Config-file arc.ini

--recompress

At updating archive repacks all data, using current compression settings. This option can be used together with a command yfor repacking archive and also when for you the degree of compression of archive is more important, rather than time of its updating (see Updating of solid-archives)

 

--sync

Synchronization of archive contents with archived files. Simultaneously updates the out-of-date files in archive, adds new files and deletes disappeared ones. At unpacking it is ignored. This option is intended mainly for future use with the archives storing generations of files; however, it also can be used for fast archive updating according to disk contents without repacking already compressed files

 

--

To stop processing options. After "--" further in a command line it is possible to use any names of files, including ones beginning with a symbol “-“

Setting options

The most part of options is simply boolean flags, they are set without any arguments. Opposite value either is used by default, or can be set by the same name with a prefix “no”, for example --dirs and --nodirs. Value by default for many options can be restored, using as parameter two minuses, for example z--; it can be useful if it is necessary to cancel the value of an option set in a config-file arc.ini

If the option accepts parameter it can be set directly or after a “=” symbol, for example options -dsen and -ds=en are equivalent. It is supposed, that use of “=” can promote increase of readership of a command line, especially at use of long names of options. With the same purpose, by the way speaking, the long variant of a name is given out to each option. I recommend to use long names of options in batch-files, scripts, setting up archiver shells and so that.

Options s and –m have the intricate format which is described in separate sections: Adjustment of solid-compression  Error! Bookmark not defined. and Adjustment of compression Error! Bookmark not defined..

In the options accepting as parameter the size in bytes, it can be set in the following ways:

·         with a suffix “b”, that means bytes: --cache65536b

·         with a suffix “k”, that means kilobytes: --cache256k

·         with a suffix “m”, that means megabytes: --cache10m

·         with a suffix “g”, that means gigabytes: --cache1g

·         with a suffix “^”, that means a degree of the two: --cache23^ sets cache to 2^23 bytes, i.e. 8 mbyte


Details

Inquiries to the user

From inquiries to the user the inquiry about rewriting a file existing on a disk – new, taken of archive is supported(maintained) only. It is set in the form of “ Overwrite <file> (y/n/a/u/s/q)? ” And answers to it are interpreted as follows:

·        y” means “Yes” – to rewrite a file

·        n” means “No” – to leave file already existing on a disk

·        a” means “Always” – to rewrite all files, skipping all the remaining inquiries

·        u” means “Update” – to update files fresher, skipping all the remaining inquiries

·        s” means “Skip” – to not overwrite any file, skipping all the remaining inquiries

·        q” means “Quit” – to leave the program

·        any other answers are automatically considered as display of disrespect for program J

 

Occurrence of inquiries about rewriting depends on value of an option “-o”:

·         –op – to give out inquiries about rewriting (default setting)

·         –o+ – rewrite all files without inquiries (it is similar to the answer “a”)

·         –o– – to not overwrite any file (it is similar to the answer “s”)

 

What files the program will suggest to rewrite also depends on options –f and –u (see the List of options). The option –y sets the answer "yes" on all inquiries to the user (in the future – not only about rewriting), that creates an opportunity for automatic work of the program without participation of the user.

Protection and restoration of data

FreeArc provides opportunities of protection and restoration of the data, similar to available in RAR and other archivers: it is possible to include the additional block of data named recovery record in archive, formed by means of operations XOR from sectors of archive. This block of data allows to check integrity of archive before any operations with it, and to restore contents of archive at detection of failures. In practice it successfully copes with distortions of data because of failures in magnetic media. Use of features of protection and restoration of data is divided into three stages:

·         Use the optionrr for addition of recovery record during archiving or a command rr for addition recovery record to already existing archive

·         Before any operation with archive its contents are automatically checked using recovery record. At detection of failures the further actions are not made, however the archive is not restored automatically. This precheck is controlled by an option –pt and at corresponding setting pretesting can be performed even in absence of archive recovery record – by use of full archive testing. The option –t enables full archive testing after its updating. You can enable maximum amount of checks using a combination of options –pt3 –rr –t J

·         If the program has found out failure in archive it will advise you to use a command r for its restoration. This command uses recovery record for restoration of contained archive and writes down the restored archive in a file fixed.<arcname>. The restored archive does not contain recovery record so you will have to add a new one

It is necessary to notice also, that if failure "will cover" the central directory of archive and it will not manage to be restored completely it will mean loss of all data in archive! Unfortunately, splitting of the central directory into small pieces (for example, an option–s8m;) will not solve a problem completely as in the program while there is no opportunity of search of the "lost" pieces of a table of contents of archive at its restoration. Therefore at present reliability of protection of data is worser than in RAR. It is planned to correct in following versions of the program.

Differences from RAR

Though FreeArc it is focused on the maximal compatibility with RAR by way of identical behaviour of commands and options with the same names, between them all is tiny distinctions:

·         –md64 means the dictionary of 64 Mb

·         different formats of argument of a option –ag; notice, that at use–ag without parameter both programs work equally

·         an optionrr is ”sticky” - its argument saved inside of archive and will be involved at all subsequent commands of updating this archive until new valuerr will be explicitly set in a command line; therefore for switching off subsequent additions of recovery record to archive it is necessary to use an option rr

·         the command r does not scan archive searching repairable directories – it restores only that it is possible to restore by means of recovery record

Technical restrictions

The sizes of processable files and the archive are limited to 2^63 bytes, and number of files in archive – to 2 billions. Both these restrictions are not principal and will be increased, as soon as there will be a necessity.

In practice number files in archive is limited only by the amount of physical memory available for program. The information on one file occupies about 300 bytes of memory. If to consider, that memory also is required for algorithms of packing and unpacking, operational system, programs, etc. we shall see, that on machine with a 512 Mb of memory it is possible to work with the archives containing up to one million of files.

At unpacking from archive only to a part of files in used memory it is proportional to amount of unpacked files. Thus, you can unpack the archives containing even millions of files, in parts, having the machine of all about 32 Mb of the RAM.

The program completely supports Unicode in names of files at packing and unpacking. I also have put a maximum of efforts to implementation of a correct printing to the screen of such names and their recognition in a command line, however I can not give in this area 100% guarantees. Unicode in a listfiles it is not supported yet, also there can be problems in support Unicode filenames inside archiver shells, such as FAR.

If between packing and unpacking of a file has UTC offset was changed (as happens in regions with daylight saving time if packing happens in the summer, and unpacking in the winter or on the contrary) time of the unpacked file can differ on an hour from what the original had. As far as I understand, it is the common problem for Windows programs.


Adjustment of solid-compression

As is known, for improvement of compression of small files it is necessary to unite them into the blocks which are compressed as one big file. This process is controlled by the option s. Examples of its usage:

-s16m

To create blocks of 16 Mb each. Rules of setting sizes look in section Setting options

-s100

To create the blocks each containing 100 files

-se

To create the separate block for each filename extension

-s

To merge all files in one solid block

-s-

To disable solid compression, i.e. to create the separate block for each file

The option –s1gb is used by default. You can also use two or more criteria simultaneously, for example -se10m100f will create the blocks containing files with identical extension, but no more 10 Mb and no more than 100 files in one block. By the way, pay attention, that «the size of the solid-block up to 10 Mb» is treated by the program as «the total size of files in the solid-block, except for the last file, should not exceed 10 Mb» or speaking differently «provide, that amount of the superfluous data unpacked at extraction of alone file, did not exceed 10 Mb».

 

For increase in reliability and support of creation of archives of unlimited size FreeArc also supports splitting of the archive directory into the separate blocks which are written down in archive directly after files described by each of them. By default the directory of archive is broken into the blocks describing on 20 thousand of files. This setting can be changed, having written down it in an option -s before “;”. For example, an option -s100; 1m sets splitting the directory of archive into the blocks describing on 100 files, and inside of each of them – a grouping of files for solid-compression in blocks on 1 Mb. Thus, this setting may be changed only simultaneously with adjustment of solid-compression. If in an option -s does not include the “;”, the size of the block of the directory is made standard – on  20 000 files. Still some more examples:

-s100;

Blocks of the directory on 100 files. Solid-blocks on all files in each block of the directory

-s;

One directory and one solid-block on whole archive

-s;100

One directory on whole archive, solid-blocks on 100 files

-s

Directories on 20 000 files, solid-blocks on all files inside of each block of the directory (the above-stated description of the same option “To merge all files in one block” is slightly inexact J)

Thus, the option used by default -s1gb is equivalent to -s20000;1gb

 

This option can be set also in the form of emulation of structure of the directory of existing programs:

-s=7z

It is equivalent “-s;”

-s=cab

Also it is equivalent “-s;”, but thus switches off compression of the directory of archive as it does(makes) cabarc

-s=zip

It is equivalent “-s;1”, i.e. each file is compressed irrespective of others, and the block of the directory is created one general(common) on all archive

-s=arj

It is equivalent “-s1;1”, i.e. each file is compressed irrespective of others and there and then after it the block of the directory describing this file enters the name

Sorting of files for solid-compression

For increase of compression ratio files at solid-compression should be sorted so that files with similar contents have to appear close to each other. For this purpose the option –ds, setting the sorting order, serves; for example –ds=gen. The letters set after –ds, are deciphered as:

c

To break already generated group on two parts – files up to 128 kb to sort according to the subsequent criteria, and files that is more than 128 kb – on the size

e

Sorting on extension

g

Sorting on the groups described in a file arc.groups

i

To group files by the first three letters of a name, inside of these groups to sort by the remained criteria, groups with single files to gather and sort on the size

n

Sorting by the filename inside of directory

p

Sorting by the name of directory

r

To reorder files inside already sorted group, placing together files with close names and/or sizes. It enables to make more close, for example, consecutive versions of the same file

s

Sorting on the size

t

Sorting on date/time of a file

 

For example, –ds=gen means sorting all over again on group from arc.groups (the letter ‘ g ’), then inside of each of these groups – on extension (the letter ‘e’), and inside of extensions on base names of files. This order of sorting, providing rather decent results, is applied in RAR and last versions of 7-zip. In FreeArc, default setting is –ds=gercpn, that even more abruptly J: first, small files (<128 kb) are grouped under directories and already inside of directories are sorted on names (letters “pn”); secondly, larger files are sorted on the size that allows to gather identical files where they would not be (the letter ‘c’); thirdly, similar files settle down beside (the letter ‘r’).

However alternative orders of sorting, such as –dsgepn, –dsgeipn, –dsgenp and –dsges, on occassion can increase compression. If you wish to disable sorting (that will accelerate packing set of small files, but will worsen compression), use an option –ds without parameter. Sorting by default is automatically switched off at disabling of solid-compression (–s–), disabling of compression (–m0), at use of the fastest compression algorithm (–m1) or fake-compression (--nodata or --crconly); if you want, that files have been sorted in these modes, set an option –ds=gercpn (or what order is necessary to you) explicitly.

By default the list of groups undertakes from a file arc.groups, being in one directory with the program. The option --groups allows to set another groups file. The order in which in it masks of files are described, determines that order in which these files will be placed in archive. The mask $default specifies a place for all remained files. The format arc.groups is completely compatible to a format of rarfiles.lst, i.e. you can use the common groups file for both programs. But thus that the automatic choice of algorithm of packing, multimedia-compression and an option –ms worked, it is necessary to add descriptions of types of files to the groups file (look at Types of files).

Updating of solid-archives

The majority of archivers gives only limited opportunities on updating solid-archives. Unlike them, FreeArc it is capable to update any solid-archives, carrying out unpacking of old data and packing anew in a parallel threads.

At updating solid-archives FreeArc inserts new files between existing ones, according to used order of sorting. For example, if in archive already there are files arc.hs and decompress.hs, the file compress.hs will be inserted between them (at use of default sorting). Thus FreeArc recompresses only those solid-blocks in which there are changed, added or removed files. Moreover, it is capable to create solid-blocks which sizes it is little bit more or less than set by an option -s, aspiring to reducing amount of the data requiring repacking. All this allows to minimize time spent for updating of solid-archives, while preserving of a high compression ratio peculiar to them.

If you want at addition of new files in archive completely to exclude repacking data already existing in it use an option --append. However keep in mind, that at use of big amount of addition commands with an interdiction of repacking you can create the archive containing set of small solid-blocks. On the other hand, you can use an option --recompress for full repacking of data in archive according to current compression settings – thus there will be recompressed even those solid-blocks which contents have not changed at all. In most cases it is better to rely on intelligence of the program and to not use these options.


Adjustment of compression

The option –m, setting a method of compression, has many opportunities. We shall examine them from simple to complex. The most simple way to choose a method of compression – options –m1 –m9. The degree of compression increases from –m1 to –m9, simultaneously with it, certainly, increases both an operating time, and use of memory. For –m3 there is enough machine with 64 Mb of the RAM, for –m4 – 128 Mb, etc.

There is also a "parallel" line of methods of compression – from –m1x up to –m9x. These methods require exactly the same amount of memory for packing – 64 Mb for –m3x, etc., but use much less memory and time for unpacking. Payment for it is small deterioration of a degree of compression and speed of packing.

At last, methods of compression –m5p, –m6p are the "strengthened" versions of methods –m5/m6, using external ppmonstr.exe instead of ppmd for compression of text files. Therefore they work more slowly, but give an even greater compression ratio.

Besides there are little unusual options of compression: –m5q/–m6q use ppmonstr for compression not only text, but also binary files, that not always increases the compression ratio (in comparison with –m5p/–m6p), but always slows down the program. –m3r/–m4r/–m5r use slower and dense (in comparison with –m3/–m4/–m5) compression for text files using the same amount of memory and without calling external programs. Using arc.ini, you can create your own methods of compression.

The following table shows characteristics of the basic methods the compression averaged on the big volume of tested data (tests were spent on a computer with the processor of 1 GHz and 256 Mb of memory):

 

Compression ratio

Speed of packing, kb/sec

Speed of unpacking, kb/sec

Memory for packing

Memory for unpacking

-m1

2.053

5.001

18.430

16 mb

16 mb

-m2

5.088

1.836

3.967

32 mb

32 mb

-m3

5.555

1.010

3.137

64 mb

64 mb

-m4

6.184

577

1.966

128 mb

128 mb

-m5

6.297

419

1.588

256 mb

256 mb

-m5p

6.603

284

612

256 mb

256 mb

Modes with fast unpacking

-m1x

2.053

5.001

18.430

16 mb

16 mb

-m2x

4.869

1.539

8.207

32 mb

2 mb

-m3x

5.430

1.029

8.659

64 mb

4 mb

-m4x

6.031

472

8.766

128 mb

8 mb

-m5x

6.168

378

8.742

256 mb

16 mb

 

In the following table results of work of other archivers on the same data set are brought:

 

Compression ratio

Speed of packing, kb/sec

Speed of unpacking, kb/sec

Memory for packing

Memory for unpacking

RAR 3.61 (-md4096-s)

-m1

3.519

2.823

16.538

32 mb

4 mb

-m2

4.753

853

19.106

32 mb

4 mb

-m3

5.038

638

18.926

32 mb

4 mb

-m5

5.355

547

 2.434

64 mb

32 mb

7-zip 4.43

-mx1

3.933

1.492

 9.537

280 kb

64 kb

-mx3

4.750

1.268

10.891

16 mb

1 mb

-mx5

5.481

408

11.693

64 mb

4 mb

-mx7

5.935

282

10.435

256 mb

16 mb

-mx9-md=16m

5.993

286

 9.958

256 mb

16 mb

UHARC 0.6 (-md32768)

-mz

4.946

1.747

1.649

32 mb

32 mb

-m1

5.559

297

2.893

256 mb

16 mb

-m3

5.799

195

3.043

256 mb

16 mb

-mx

6.353

230

259

64 mb

64 mb

As you can see, the majority of archivers provide rather limited opportunities on regulation of a degree of compression / give speed of work. The following table compares FreeArc compression modes with comparable on speed/compression modes of other archivers:

FreeArc

PKZIP

ZIP

RAR

7-zip

UHARC

-m1

-es

-1

-m1

-mx1

 

-m2

-ex

-9

-m3

-mx3

-mz

-m3

 

 

-m5

-mx5

-m1

-m4

 

 

 

-mx9

-m3

-m5

 

 

-mx

Types of files

In each mode of compression (m1…m9x) used algorithms of compression get out depending on type of a file – one algorithms are applied to text files, for binary – others, for multimedia - the third. The type of a file is determined on its extension. With this purpose the format of a file arc.groups, describing the order of sorting of files, is a little bit expanded and includes also descriptions of types of files in the form of labels $text, $binary, etc., included before text and binary files, accordingly:

$text

readme.*

*.txt

*.doc

$binary

*.pdf

$default

$compressed

*.7z

*.arc

*.rar

*.zip

$wav

*.wav

This description means, that files readme.*, *.txt and *.doc are text ones, 7z/arc/rar/zip – packed ones, etc. The label $default, as well as in RAR, describes the "other" files which are not match any explicitly specified masks. In the given example other files placed in the group $binary. So, in this example all files are broken into 4 groups - $text, $binary, $compressed and $wav.

To achieve the maximal compression of your own data, keep up that all extensions of the text, sound, graphic and already packed files typical for your data have been included in section $text, $wav, $bmp and $compressed, accordingly. Files with unknown extensions goes into section $binary, therefore for binary files inclusion of their extensions in this file is not so critical (but do not forget, that arc.groups also describes the order of sorting). Do not forget to use an option –ms  if it is necessary for you to disable compression of files from group $compressed.

Decoding of compression algorithm

The method of compression set in an option–m, passes(takes place) through a series of substitutions. We shall consider(examine) result of their work on an example of a mode of the compression used by default:–m4. First of all–m4 it will be transformed in–m4b/4t. This record means « to use for text files (group $text) a mode of compression–m4t, and for all other–m4b ». You can set similar modes, for example:

·        the option –m4b means « similarly-m4, but to consider(examine) all files as binary »

·        the option –m4b/3t means « to use for binary files compression-m4b, and for text–m3t », i.e. to compress binary files at a level–m4, and text – at a level–m3

All simple records of methods of compression (-m1 ..-m9x) will be transformed in a similar way, for example–m9x turns in–m9xb/9xt.

 

Then to this record specific methods of compression for separate types of files are added. They look like $type=method. In particular, if the option–ms is used (to not pack already compressed files), record for–m4 will already look(appear) as–m4b/4t / $ compressed=0, where “0” (i.e.–m0) is a method of compression which will be applied to the files included in group $compressed.

Then to this list methods of compression for multimedia files are added and it already looks(appears) as–m4b/4t / $ compressed=0 / $ wav=wav / $ bmp=bmp. And this all still absolutely lawful record of an option–m which can be set in a command line. It(she) is deciphered as “ to compress files from group $bmp algorithm bmp, groups $wav – algorithm wav, groups $compressed – algorithm 0, groups $text – algorithm 4t, and at last remained - algorithm 4b ”.

 

On a following step the first element of record is decoded again and it turns into mexe+4b / $ obj=4b / $ text=4t / $ compressed=0 / $ wav=wav / $ bmp=bmp. It is connected with necessity to allocate files with a moved objective code (*.obj, *.lib, etc.) in separate group to which the exe-preprocessing worsening for them a degree of compression is not applied. Record “exe+4b” means «to apply to initial data algorithm exe, then to compress its output with algorithm 4b». You can connect in a chain up to 8 algorithms though really it is meaningful to apply all over again preprocessors (exe, mm, lzp, dict) and to finish a chain the packer (lzma, ppmd, grzip).

 

At last, approaches(suits) it is time to replace the used abbreviations (4b, 4t) with real methods of compression, to be exact even the whole chains of methods. After that transformations record turns in

 –mexe+rep:64mb+lzma:8mb / $ obj=rep:64mb+lzma:8mb / $ text=dict:p+lzp+ppmd:8:96mb / $ wav=tta / $ bmp=mm+grzip

– And here I have still lowered(omitted) half of details. As you can see, the reduced record of a method of compression rescues(saves) whole lives J. rep, lzma which you here see are already real algorithms of compression, instead of abbreviations. Through colons specifications of their parameters instead of used by default are set.

 

When the thin control over a choice of algorithms of compression is necessary for you, you can always take advantage of specifications in an option–m for that task how to compress separate types of files or from what methods and with what parameters to design algorithm of compression. Use record of type–m4b/3t if you wish to set separately methods of compression for text and binary files. Add records of a kind $type=method if you wish to specify methods of compression for the certain types of files, for example–m4 / $ html=2t or–m4b/3t / $ html=2t. At last, use record of real algorithms of compression (lzma, ppmd, dict, etc.) and chains of methods (dict+lzp+ppmd) instead of abbreviations (4b, 2t) if you need to design non-standard algorithm of compression. The option - display will allow you to track(look after) results of creativity J.

It is necessary to mean, that special compression methods for multimedia files are added in a command line only at use of options–mN or–mNx. If you write down the basic method of compression (i.e. that up to “$”) in more deciphered kind to you will have to add them manually. Thus, the option–m4 in accuracy is equivalent–m4b/4t / $ wav=wav / $ bmp=bmp. You also can add to current adjustments(options) of compression additional algorithms for separate types of files, using an option in the form of–m$type=method; for example–m5–m$wav=lzma it is completely equivalent to an option–m5 / $ wav=lzma. The option–ms in accuracy is equivalent–m$compressed=0.

 

You can print the full list of the substitutions which have been built in the program, command Arc - print-config and to add with their own substitutions, having set them in arc.ini. As a whole this system of substitutions is aimed at enabling you to adjust(set up) as much as possible flexibly ways of compression and at the same time it is convenient to choose the necessary adjustments(options) at everyday use.

Adjustment of algorithms of compression

Let's return to decoding algorithm of compression by default:

m exe+rep:64mb+lzma:8mb / $obj=rep:64mb+lzma:8mb / $text=dict:p+lzp+ppmd:8:96mb / $wav=tta / $bmp=mm+grzip

As you can see, the algorithm of compression for each type of files consists of one or several processes divided(undressed) by signs plus. Thus the first processes in a chain play a role of the preprocessors reducing redundancy of a file, and last process carries out final compression. After the name of each process through colons additional parameters can enter the name, however they are unessential – all adjustments(options) have values by default. As algorithm of compression are usually used:

·        for text files (including here source codes of programs, html/xml/ps files and formats of the files representing in essence the usual text) should be used many other things grzip, ppmd or pmm – depending on a demanded level of speed/compression. grzip:m4 – the fastest algorithm of compression for texts, grzip:m1 is close to ppmd – for some percent(interests) worse compression, but unpacking is twice faster. For increase in compression in ppmd/pmm it is necessary to increase the order of model and simultaneously volume of used memory. Speed of work grzip – some Mb/sec, ppmd – 1 Mb/sec, pmm – 100-300 kb/sec All these algorithms are (almost symmetric, therefore requirements to memory at unpacking and its speed do not differ almost from parameters of packing. Only grzip:m1 unpacks twice more quickly. The algorithm pmm  is realized by a call of the external program ppmonstr.exe which presence is necessary both for packing, and for unpacking

·         for binary files (including, by the way, and texts from a bat-bit codings) it is necessary to use lzma. lzma:fast gives fast compression, comparable with RAR. The further acceleration can be received, using smaller values of parameters fb and mc, for example lzma:fast:fb5:mc1. By default lzma uses strong compression, it can be increased by means of increase in the same parameters fb and mc, for example lzma:fb128. One more parameter – the size of the dictionary. For packing it is necessary 10*dictsize byte of memory, for unpacking – dictsize. Speed of unpacking – 10 Mb/with, packings – 200кб-2мб/с depending on the chosen degree of compression. lzma it is possible to use also for text data when it is necessary to provide greater(big) speed of unpacking

 

Use of preprocessors before the basic algorithm of packing allows still more some percent(interests) of compression. For binary data it is possible to use following preprocessors:

·         rep – deletes repetitions on greater(big) distances. Considering, that lzma can use the dictionary only in 1/16 from volume of computer memory, and rep – in 1/2 from this volume, they are favourable for combining together: first rep codes repetitions on long distances, and then lzma carries out the basic compression, for example rep:512mb+lzma:64mb is quite suitable algorithm for the machine with 1 gb memory while lzma:512mb on it it couldn’t be run, certainly. Requirements of rep to memory 1.25*dictsize. Owing to that algorithms process given by greater(big) pieces, such chain can work by the machine with 1 gb memories though trashing a disk will be, certainly,  fair.

·         exe – will transform an executed code for improvement of compression. At use together with rep should be before it in a chain. On obj/lib files (i.e. files with movable object code) worsens compression, therefore FreeArc is set up so, that these files are allocated in separate group to which this preprocessor is not applied. Hardly-hardly worsens compression of files without an objective code

 

For text data it is possible to use following preprocessors:

·         lzp – it is similar rep, finds repeating lines, but it is calculated on processing of text data

·         dict – finds repeatedly repeating byte sequences ("words") and replaces them with 1-2 byte codes

With grzip both these preprocessors are not used – lzp because it is already built in in grzip, and dict – because from it in a combination with bwt/st is not enough sense. Use of these preprocessors is especially favorable affects on lzma and pmm as it allows to accelerate their work considerably.

 

The list of all possible(probable) adjustments(options) for each algorithm is resulted(brought)  in the following table:

Parameter and its value by default

The description

PPMD

o10

The order of model, i.e. quantity(amount) of symbols on which the prediction of a following symbol is carried out. Can be specified also without a prefix: ppmd:4

mem48mb

Volume of the memory used under model. Also it is possible to specify in the form of “m96mb” and even “96mb”, i.e. with a prefix “m” or at all without a prefix. About how memory sizes are set, has see undressed Setting options Error! Bookmark not defined.

r0

Mode of updating of model (0/1/2.) “r” it is equivalent “r1”

PMM

o16

The order of model, i.e. quantity(amount) of symbols on which the prediction of a following symbol is carried out. Can be specified also without a prefix: pmm:10

mem192mb

Volume of the memory used under model. Also it is possible to specify in the form of “m96mb” and even “96mb”, i.e. with a prefix “m” or at all without a prefix. About how memory sizes are set, has see undressed Setting options Error! Bookmark not defined.

r1

Mode of updating of model (0/1/2.) “r” it is equivalent “r1”

LZMA

a1

Algorithm of search of conformity:a0” – fast, “a1” – normal. With a0 it is better to use hc4 match finder (see below), with a1 – bt4. Therefore reductions fast = a0:hc4 and normal = a1:bt4 – adjustments(options) for fast and usual compression, accordingly are entered. The last costs(stands) by default. For use of fast compression write lzma:fast

d8mb

The maximal distance of search of conformity (dictionary). It is set as volume of memory, has see undressed Setting options Error! Bookmark not defined.. It is possible to specify This parameter without a prefix: lzma:4m

fb32

The minimal length of the found conformity after which search of more successful ( even longer) conformity stops. Reduction of this parameter is capable to increase considerably speed of packing due to reduction of a degree of compression. It is possible to specify this parameter without a prefix: lzma:4. Increase it up to 128, if you need to provide the maximal compression

mc0

The maximal length of a chain of search. Adjustment mc0 establishes default value which depends on parameters a and fb: mc = a == 1? fb/2+16 : fb/4+8. Value of this parameter directly influences a ratio between speed and degree of compression

lc3

The sense of this parameter is known only to Igor Pavlov, the author of algorithm LZMA J

lp0

-.-

pb2

-.-

mfBT4

Match finder. After “mf” it is possible to specify one of following types: "bt2", "bt3", "bt4", "hc4". The detailed description of their features can be found in the documentation on 7-zip. It is possible to specify this parameter without a prefix: lzma:hc4

GRZip

m1

Method of compression (from the most dense to the fastest: m1 – BWT+WFC, m2 – BWT+MTF, m3 – ST4+WFC, m4 – ST4+MTF)

b8mb

The size of the block of packed data. It is set as volume of memory, has see undressed Setting options Error! Bookmark not defined.. It is possible to specify This parameter without a prefix: grzip:4m

l32

Minutes length of the conformity used at LZP-preprocessing. It is possible to specify this parameter also without a prefix: grzip:64. For text data it is more favourable to use values hardly less, for binary – hardly it is more, 32 is some quite successful compromise

h15

The logarithm of number of elements in the hash-table used at LZP-preprocessing. Reduction of this parameter accelerates packing and unpacking, but reduces a degree of compression, especially for data with great amount of small repeated strings, as for example texts. For achievement of the maximal compression this parameter can be increased, say, up to 20

s

To use alternative algorithm of sorting in a BWT-mode. Practical value has no

a

To use heuristic algorithm for splitting data into blocks of the smaller size

l

Switching-off use of a LZP-preprocessor. On some data (for example, on texts of books) this preprocessor does not give an appreciable prize in compression, but increases time and packings, and unpackings

d

To include a heuristic preprocessor for multimedia data

p

To disconnect(switch-off) all preprocessors (LZP, multimedia, splitting of data into blocks of the smaller size). It is necessary to notice, that by default all of them, except for LZP, are disabled, and LZP may be also disabled by an option “-l”. So it more likely secure on the future when in algorithm GRZip new preprocessors can be added

LZP

 

It is necessary to notice, that algorithm LZP is a same LZP-preprocessor from GRZip, slightly modified Dimoj SHkarinym for more effective utilization together with ppmd/pmm. Therefore its base parameters do not differ from parameters of a LZP-preprocessor in algorithm GRZip though to them some new opportunities are added

b8mb

The size of the block of packed data. It is set as volume of memory, has see undressed Setting options Error! Bookmark not defined.. It is possible to specify This parameter also without a prefix: lzp:4m. Volume of the memory demanded both for packing, and for unpacking – 2*blocksize

l64

Minutes length of conformity. It is possible to specify this parameter also without a prefix: lzp:16. 64 is optimum enough value for use LZP as preprocessor to algorithm PPMD, the author recommends to use the value calculated under the formula ppmdOrder*10-15

h18

The logarithm of number of elements in the hash-table used at LZP-preprocessing. Reduction of this parameter accelerates packing and unpacking, but reduces a degree of compression, especially for data with great amount of small repeated strings, as for example texts. For achievement of the maximal degree of compression this parameter can be increased, say, up to 24, that will require 4*2^24 = 64 Mb of memory over abovementioned 2*blocksize

d

Border after which requirements to length of lines decrease that allows to build the two-level mechanism – we shall tell, up to a distance of 8 Mb we select lines from 64 bytes, after – from 32 bytes: lzp:64:d8m:s32

s32

Requirements to length of lines after crossing a barrier (if this parameter is more:l its value is dumped(reset) up to:l)

100%

If the block of data will be compressed worse than specified percentage it will be transferred in uncompressed kind.

REP

b64mb

The maximal distance of search of conformity (dictionary). It is set as volume of memory, has see undressed Setting options Error! Bookmark not defined.. It is possible to specify This parameter without a prefix: rep:512m. Volume of the memory demanded for packing – 1.25*dictsize, for unpacking – 2*dictsize

l512

Minutes length of conformity. It is possible to specify this parameter also without a prefix: rep:32. The best value – 512 as if rep will start "to intercept" lines of the smaller sizes the general(common) level of compression together with the subsequent lzma will decrease

h0

The logarithm of number of elements in the hash-table. Reduction of this parameter accelerates packing and unpacking, but reduces a degree of compression, especially for data with great amount of small repeated strings, as for example texts. For achievement of the maximal degree of compression this parameter can be increased, say, up to 24, that will require at compression 4*2^24 = 64 Mb of memory over abovementioned 1.25*dictsize. Value 0 designates, that the program chooses the hash size itself – at a rate of 0.25*dictsize

a1

Factor of «search amplification». Use of values more than 1 allows to find a bit more occurrences. This parameter can be used for achievement of the maximal compression

d

Border after which requirements to length of lines decrease that allows to build the two-level mechanism – we shall tell, up to a distance of 8 Mb we select lines from 512 bytes, after – from 32 bytes. It can make sense, if lzma all is equally limited by a distance 8мб: rep:512:d8m:s32+lzma:8m

s512

Requirements to length of lines after crossing a barrier (if this parameter is more:l its value is dumped(reset) up to:l)

DICT

b64mb

The size of the block of packed data. It is set as volume of memory, has see undressed Setting options Error! Bookmark not defined.. It is possible to specify This parameter also without a prefix: dict:32m. Volume of the memory demanded for packing – 1.5*blocksize, for unpacking – blocksize

p

To adjust(set up) parameters of algorithm on use together with ppmd/pmm or bwt/st algorithms (by default parameters are adjusted(set up) on use together with lzma). This adjustment toughens criteria of selection of words, as for more powerful algorithms of compression (we shall tell, meeting in the block of only 30-100 times) it is better to not code rare words

100%

If the block of data will be compressed worse than specified percentage it will be transferred in uncompressed kind. An example: dict:90 %

EXE

 

It is unique algorithm which has no settings

Block algorithms

Block compression/preprocessing algorithms are those which read large block of data, process them, write out results, and then read in the next block. Thus each block is compressed independently of the others. From implemented now in the program grzip, lzp, dict are belongs to them. The size of the block in these algorithms – one of parameters.

Use of memory

If 75% from the total size of physical memory are not enough for the chosen compression algorithm then the program automatically reduces the size of the dictionary/block/... so that to match to «physical computer conditions». This restriction can be removed by means of an option –lc– or you can establish your own restriction on memory used at packing, for example –lc128mb. Also it is possible to set up restriction on amount of memory which is required for unpacking created archive, by means of an option –ld. The size of memory can be set also in percentage of the total size of the physical RAM, for example –lc25%.

The option –md establishes the size of the block (for block algorithms) or the dictionary (for LZMA).

If the total amount of data packed into one solid-block is less than size of the block/dictionary then the size of the block/dictionary for this solid-block is decreased down to amount of actual data.

At use of block algorithms and solid-compression, the size of created solid-blocks will be limited by the block size parameter of algorithm used (using blocks of larger size will not increase compression ratio but wll increase time required to unpack single files).

The program uses the read-ahead cache for acceleration of compression of small files. By default the program tries to allocate for it 16 mb or the double size of the block of data at use of block algorithms – looking that more. If necessary this size automatically decreases so that together with the memory necessary for algorithm of packing, it did not exceed 50% of the computer RAM. You can set up the size of this buffer, for example --cache=100mb or --cache=40%. The option --cache- establishes minimal possible size of a cache (equal to one block, i.e. 256 kb).


Config-file arc.ini

The archiver can be configured by means of a config-file arc.ini which should be in the same directory as Arc.exe. An example of contents a config-file:

 

; Default options for all commands

--logfile=c:\temp\freearc.log --display=rts

 

[Default options]

; Default options for specific commands

a create = -m5 –s128m

create = --display

 

[Compression methods]

6pt = dict:p + lzp:64mb:32:h22 + pmm:16:400mb

#d = #b / #xt     ; One more comment

#d$bmp = bmp

#d$wav = wavfast

 

First, it may contain comments beginning with ‘;’, and the comment can occupy a separate line or end of a line. The first (not empty and not being the comment) line in config-file can contain options common for all commands. All the remaining contents of config-file are broken to sections which headings appears in square brackets: [Default options], [Compression methods]. The section [Default options] describes default options for separate commands. On the left of a ‘=’ symbol commands to which these options are applied, and on the right – options itself. If the same command appears at several lines then all the options set for it are summarized. For example, the above described config-file sets the following default options for a command ‘create’: -m5 –s128m --display.

Section [Compression methods]

The section [Compression methods] allows to describe the reduced designations for methods of compression. All the methods of compression built in by the archiver are described in the same way, you can print their descriptions by command Arc --print-config and even to insert these descriptions into the config-file. As descriptions of methods from a config-file have a greater priority, than built in in the program, you can adjust them to your taste and use those parameters which consider it necessary. Inside of archive description of methods are stored in the deciphered kind, therefore it will not create any problems of incompatibility. On the other hand, you can create your own methods of compression, reductions which will facilitate their instruction(indication) in a command line are more exact. For example, the above-stated config-file describes a method 6pt which will be used for compression of text files in -m6p. Besides it describes the whole family of methods of compression –m1d. .-m9d, which combine the compression of binary files used in –m#, and the compression of text files used in –m#x.

Language of the description of options of compression demands some explanatories. At use # in the description this line is replaced with the automatic device with 9 lines where # runs values from 1 up to 9, therefore a line

#d = #b / #xt

From the aforesaid a config-file it is equivalent:

1d = 1b / 1xt

...

9d = 9b / 9xt

Each line in this section sets the possible(probable) substitution used at decoding of methods of compression: the left part of this line (up to a sign ‘ = ’) is replaced on right. Thus,-m3d it is replaced on-m3b/3xt, i.e. we program-m3d as the option of compression equivalent–m3b for binary files, and–m3xt for text. Decoding of these methods of compression can be seen on command Arc - print-config:

3b = lzp:32m:256:h18:d4m:s20:90 % + 3xb

3xt = dict:32m:80 % + 3xb

As you can see, both records refer to decoding of a method 3xb which can be found here:

3xb = lzma:4m:fast:32

Thus,-m3d will use lzp:32m:256:h18:d4m:s20:90 % + lzma:4m:fast:32 for compression of binary files, and dict:32m:80 % + lzma:4m:fast:32 for text. Is not that so, elementary? J It is necessary to add only, that for files which can appear executed, the preprocessor exe will be added. Full decoding of a method of compression can be seen, having set at archiving an option - display.

 

For compression of multimedia files by corresponding(meeting) algorithms lines are used:

#d$bmp = bmp

#d$wav = wavfast

Alternative way to program all it would be the unique line:

#d = #b / #xt / $bmp = bmp / $wav = wavfast

As you can see, it is very similar to variants of the task of an option –m in a command line. Blanks can be used freely – before actual use all of them leave.

 

At the task, for example, options –m3d in a config-file will be found in a command line the lines beginning with –m3d$, and their contents are added to decoding a method of compression. Thus, –m3d it will be deciphered as –m3b/3xt/$bmp=bmp/$wav=wavfast. In precisely the same way multimedia-compression is added for usual methods–mN/–mNx, only corresponding(meeting) substitutions are included directly in the program. You can unpack(print out) them command Arc --print-config. Pay attention, that algorithms of compression of multimedia files vary depending on a mode of compression is is made for achievement of demanded parameters of productivity. Studying(investigating) this listing, you also can notice, that for the task of concrete algorithms of the compression interrupting general(common) (we shall tell, 1x$bmp instead of #x$bmp), they should be written down above the general(common). Such – from two lines with an identical left part a greater priority the first (do not forget, that lines with # will simply be transformed to 9 lines where # runs values from 1 up to 9) has the general(common) principle.


The information for developers

Addition in the program of new algorithms of compression

For the beginning, create in the directory where there are source codes FreeArc, subdirectory with the name “ C _ <method> ” where method is a name of realized algorithm. Place in this directory all the files necessary for realization of your algorithm. Take for a basis for the makefile, C _ <method> .h, C _ <method> .cpp similar files from directory C_GRZip. Replace in all of them mentions grzip with the name of your method of compression. Then edit the description of class C _ <method> that it included all parameters of your method of compression and only them, and change realization of this class that it corresponded(met) to its description. Do not forget about parse _ <method>.

 

All algorithms of compression and the preprocessing, used in the program, should be led to the standard interface: the function of packing receiving parameters of algorithm, necessary for packing, and the reference(link) on callbacks for reading and data recording; and the function of unpacking receiving parameters of algorithm, necessary for unpacking, and the reference(link) on same callbacks. For example:

 

typedef int INOUT_FUNC (void *buf, int size);  // It Is declared(announced) in Compression.h

int superzip_compress (int dictionary, int level, INOUT_FUNC *read_f, INOUT_FUNC *write_f);

int superzip_decompress (int dictionary, INOUT_FUNC *read_f, INOUT_FUNC *write_f);

 

Further, the algorithm of packing/unpacking carries out the work, causing function read_f for reception of entrance data:

insize = read_f (buf, bufsize);

The address of the buffer where it is necessary to read through entrance data, and its size in bytes is transferred(transmitted) this function. It(she) returns:

·        a negative number – an error code

·        a zero – an attribute of end of entrance data

·        positive number – quantity(amount) of the read through bytes

 

Function write_f is caused for record of target data:

result = write_f (buf, bufsize);

The address of the buffer where there are target data and quantity(amount) of written down bytes is transferred(transmitted) this function. It(she) returns:

·        a negative number – an error code

·        a zero or positive number – an attribute of success

 

At reception of negative result from any of these functions it is desirable to stop there and then process of packing/unpacking and to return the received number as result of work of all procedure compress/decompress. After reception of the zero answer from function read_f it is forbidden to cause it(her) again – differently results can be unpredictable. All buffers are allocated with your procedures, after returning from procedure write_f contents of the buffer are written already down also it it is possible to copy new data. At returning from process of packing/unpacking it is necessary to release(exempt) all the allocated blocks of memory. It is necessary to include (by means of *include) in the code a file "../Compression.h" – in it are all necessary for communication(connection) with FreeArc definitions. It is desirable to read and write down given by blocks on BUFFER_SIZE byte (this constant also is in Compression.h).

 

After you have defined(determined) functions of packing and unpacking on С/C ++ (to do: to give a code for testing algorithms of packing/unpacking directly on), it is possible to start their connection to FreeArc. Add to Compression/compile.cmd a call make for yours makefile, and include in compile.btm an objective file of your method of compression near to c_grzip.o.