About
Genetic Algorithm File Fitter, or just GAFFitter, is a command-line software written in C++ that arranges (via a genetic algorithm) an input list of items or files/directories into volumes of a certain capacity (target), such as CD or DVD, in a way that the total wastage is minimized. By smartly arranging the input list, GAFFitter fits better the given items and so optimizes (reduces) the number of required volumes to pack them.
Currently, GAFFitter runs on GNU/Linux and other POSIX systems, but it is designed in such manner that should be easily extended to non-POSIX operating environment.
Features
There are five key features behind GAFFitter, namely:
- Search by a global meta-heuristic (Genetic Algorithm search).
- The command-line interface provides high integration (via pipe) with other tools, i.e. it works as a "filter".
- Allows the user to enter 'size identifier' pairs directly instead of file/dir names.
- Pretty configurable. GAFFitter has many input parameters to control/adjust its behavior (including GA parameters).
- It is Free Software! (GPL)
Usage
Usage: fit -t target[unit] [options...] <files> ... | fit - -t target[unit] [options...] [files] the unit suffixes 'k', 'm', 'g' or 't' can be used, where: k = KB/KiB, m = MB/MiB, g = GB/GiB and t = TB/TiB [default = bytes]
General options: -t <f>[unit], --target <f>[unit] target size (mandatory), f>0.0 --si use powers of 1000 (not 1024) for target, min, max and output sizes --bins <n>, --vols <n> maximum number of bins (volumes) [default = "unlimited"] -v, --verbose verbose --min <f>[unit], --min-size <f>[unit] minimum file size [default = none] --max <f>[unit], --max-size <f>[unit] maximum file size [default = none] -B <n>, --block-size <n> the smallest amount of bytes a file can occupy [default = 1] --ss, --show-size print the size of each file --sb, --show-bytes also print the sizes in bytes --hi, --hide-items don't print the selected items --hs, --hide-summary hide summary line containing sum, difference and number of selected items -s, --sort-by-size sort the output by size, not by name -n, --no-case use case-insensitive sorting -r, --sort-reverse sort the output in reverse order -z, --null-data assume NULL (\0) as the delimiter of input files via stdin (pipe) -Z, --null same as --dw '\0'. See also the -0 and --hs options -0, --null-bins same as --bs '\0'. See also the -Z and --hs options --bs <char>, --bins-separator <char> separate bins (vols) with "char" [default = newline] --ew <char>, --enclose-with <char> enclose file names with "char" [default = none] --dw <char>, --delimit-with <char> delimit file names (lines) with "char" [default = newline] -1 (or --fast) ... -9 (or --best) select preset search parameters [default = -3] --version print GAFFitter version and exit -h, --help print this help and exit
Direct Input options: --di, --direct-input switch to direct input mode, i.e., read directly "size identifier" pairs instead of file names --di-b, --di-bytes assume input sizes as bytes --di-k, --di-kb assume input sizes as kibi bytes (KiB); KB if --di-si --di-m, --di-mb assume input sizes as mebi bytes (MiB); MB if --di-si --di-g, --di-gb assume input sizes as gibi bytes (GiB); GB if --di-si --di-t, --di-tb assume input sizes as tebi bytes (TiB); TB if --di-si --di-si use powers of 1000 (not 1024) for input sizes
Genetic Algorithm options: --ga-s <n>, --ga-seed <n> GA initialization seed, n>=0 [default = 1]; 0 = random --ga-rs, --ga-random-seed use random GA seed (same as --ga-seed 0) --ga-ng <n>, --ga-num-generations <n> maximum number of generations, n>0 [default = auto] --ga-ps <n>, --ga-pop-size <n> number of individuals, n>tournament_size [default = auto] --ga-cp <f>, --ga-cross-prob <f> crossover probability, 0.0<=f<=1.0 [default = 0.75] --ga-mp <f>, --ga-mutation-prob <f> mutation probability, 0.0<=f<=1.0 [default = 0.30] --ga-sp <n>, --ga-sel-pressure <n> selection pressure (tournament size), 2<=n<pop_size [default = 2] --ga-theo [n], --ga-theoretical [n] stop if the theoretical minimum number of bins is reached. If n is given, it is assumed to be the theoretical minimum number of bins.
Other search methods --ap, --approximate local approximation using Best Fit search (non-optimal but very fast) --sp, --split just split the input when target size is reached (preserves original order while splitting)
Examples
Simple usage
~$ fit -t 700m *.ogg brahms_4_balladen_op_10_3.ogg iii_sarka.ogg ... quatuor_gdur_grave.ogg [1] 700.00MiB/700.00MiB of 4.55GiB, Diff: 1Bytes, Items: 296/861 francesca_da_rimini_op_32.ogg sinf_en_re_menor_lento.ogg ... zapateado_allegro_vivace.ogg [2] 700.00MiB/700.00MiB of 4.55GiB, Diff: 59Bytes, Items: 43/565 . . .
Creating a maximum amount of volumes (--bins/--vols option)
~$ fit -t 4.37g --bins 1 * beethoven_melos_quartett beethoven_ludwig_van bruch_dvorak ... richard_wagner wolf_strauss [1] 4.37GiB/4.37GiB of 21.57GiB, Diff: 1.11KiB, Items: 14/68
Input via stdin (pipes)
~$ find . -type f | fit - -t 1.4m ./fit ./optimizers/GeneticAlgorithm.o ./Input.o ... ./util/CVS/Repository [1] 1.40MiB/1.40MiB of 1.49MiB, Diff: 0Bytes, Items: 20/39 ...
Using the --split option
This option is useful when you need to preserve the order of the given files/items, thus just splitting them accordingly to the target size. This method usually wastes more space, though.
Suppose you have a few music files and you want to generate two volumes of 10MiB each. However, preserving the input order is important for you:
~$ fit -t 10m --split * 01_track1.ogg 3.73MiB 02_track2.ogg 5.42MiB [1] 9.15MiB/10.00MiB of 77.28MiB, Diff: 870.00KiB, Items: 2/18 03_track3.ogg 3.37MiB 04_track4.ogg 3.69MiB [2] 7.06MiB/10.00MiB of 77.28MiB, Diff: 2.94MiB, Items: 2/16 ...
Of course, without this ordering restriction the volumes are better explored:
~$ fit -t 10m * 03_track3.ogg 3.37MiB 05_track5.ogg 3.49MiB 11_track11.ogg 3.14MiB [1] 10.00MiB/10.00MiB of 77.28MiB, Diff: 2.00KiB, Items: 3/18 02_track2.ogg 5.42MiB 13_track13.ogg 4.54MiB [2] 9.96MiB/10.00MiB of 77.28MiB, Diff: 38.00KiB, Items: 2/15 ...
More features
~$ fit -t 600k --show-size src/* src/optimizers 447.38KiB src/DiskUsage.o 104.19KiB src/util 32.42KiB src/Input.cc 4.10KiB src/Optimizer.hh 3.40KiB src/Input.hh 3.20KiB src/Params.hh 3.17KiB src/DiskUsage.hh 2.15KiB [1] 600.00KiB/600.00KiB of 1.50MiB, Diff: 1Bytes, Items: 8/19 ...
Direct Input
~$ fit -t 3.14 --di --ss '1 id one' '2.4 ID2' '0.3 b' '0.5 foo' '1.23456789 bar' b 0.3 bar 1.23456789 foo 0.5 id one 1 [1] 3.03456789/3.14 of 5.43456789, Diff: 0.10543211, Items: 4/5 ID2 2.4 [2] 2.4/3.14 of 5.43456789, Diff: 0.74, Items: 1/1
~$ du * | fit - -t 200 --di optimizers util/CVS Makefile Exception.hh [1] 200/200 of 536, Diff: 0, Items: 4/22 ...
~$ du * | fit - -t 200k --di --di-k optimizers util/CVS Makefile Exception.hh [1] 200.00KiB/200.00KiB of 536.00KiB, Diff: 0Bytes, Items: 4/22 ...
Or if you prefer, some screenshots can be found here. :)
GAFFitter's scripts
The official GAFFitter scripts can be downloaded here.
Creation of ISO9660 image files
Being a filter, GAFFitter can be used for many tasks involving the packing of files and directories. For instance, the Python script gaff-iso creates CD/DVD ISO9660 images using genisoimage or mkisofs. The syntax is as follows:
Usage: gaff-iso --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs Options: --dvd create volumes of 4.38GB (DVD) --dvd-dl create volumes of 7.95GB (Dual Layer DVD) --cd create volumes of 702MB (80min CD) --cd74 create volumes of 650MB (74min CD) --cd90 create volumes of 790MB (90min CD) --cd99 create volumes of 870MB (99min CD) --size n[k,m,g,t] create custom volumes of n KiB/MiB/GiB/TiB each --split just splits the input (i.e. preserves original order) --vols n maximum number of volumes (default = as much as possible) -o dir, --output dir output directory for the ISO images -v verbose mode -y overwrite any existing output files Advanced options: --mkisofs-opts "opts" use custom mkisofs options (default = "-D -r -J -joliet-long") Example: gaff-iso --dvd *
The generated image files will be stored in the current directory (or in the
directory specified by the user via -o
option) and they will be named as
CD-0001.iso, ..., CD-000n.iso
and DVD-0001.iso, ..., DVD-000n.iso
for
CD and DVD images respectively. n
is the number of volumes (bins) used to
pack the given list of files/dirs.
K3B and GAFFitter
Similar to gaff-iso, the usage information for gaff-k3b is:
Usage: gaff-k3b --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs Options: --dvd create volumes of 4.38GB (DVD) --dvd-dl create volumes of 7.95GB (Dual Layer DVD) --cd create volumes of 702MB (80min CD) --cd74 create volumes of 650MB (74min CD) --cd90 create volumes of 790MB (90min CD) --cd99 create volumes of 870MB (99min CD) --size n[k,m,g,t] create custom volumes of n KiB/MiB/GiB/TiB each --split just splits the input (i.e. preserves original order) --vols n maximum number of volumes (default = as much as possible) -v verbose mode Example: gaff-k3b --dvd *
Nautilus Scripts for gaff-k3b
For those who use Nautilus, the following scripts may be useful:
- nautilus-gaff-k3b-cd
- Packs files/dirs into CD volumes and burn them using K3B
- nautilus-gaff-k3b-cd-split
- Packs files/dirs into CD volumes and burn them using K3B, but preserves the order
- nautilus-gaff-k3b-dvd
- Packs files/dirs into DVD volumes and burn them using K3B
- nautilus-gaff-k3b-dvd-split
- Packs files/dirs into DVD volumes and burn them using K3B, but preserves the order
Installation:
- Set the script files as executable and put them in
~/.gnome2/nautilus-scripts/
Usage:
- Under Nautilus, select the files/dirs to be packed
- Right-click the selection and go to the entry Scripts on the context menu
- Select one of the above mentioned scripts
Note: those scripts require gaff-k3b (see gaffitter.sf.net), awk, sed and echo
Resources: K3B website, Nautilus File Manager Scripts
Brasero and GAFFitter
gaff-brasero is a Python script that integrates GAFFitter with Brasero CD/DVD burner. (Thanks to Mark Edgington)
Usage: gaff-brasero --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs Options: --dvd create volumes of 4.38GB (DVD) --dvd-dl create volumes of 7.95GB (Dual Layer DVD) --cd create volumes of 702MB (80min CD) --cd74 create volumes of 650MB (74min CD) --cd90 create volumes of 790MB (90min CD) --cd99 create volumes of 870MB (99min CD) --size n[k,m,g,t] create custom volumes of n KiB/MiB/GiB/TiB each --split just splits the input (i.e. preserves original order) --vols n maximum number of volumes (default = as much as possible) -v verbose mode Example: gaff-brasero --dvd *
Note: Unlike K3B, Brasero follows symbolic lynks automatically; at present Brasero doesn't provide means to disable such behaviour. Also, Brasero can only manage one volume at time, so on multiple volumes output Brasero will be called sequentially multiple times.
GAFFitter as a filter
Following symbolic links using GNU Disk Usage (du)
GAFFitter doesn't follow symbolic links, however, this can be fully achieved by using du for getting the file/dir sizes:
du -Lbs <files/dirs> | fit - --di --di-b <user options>
For example
fit -t 700m -B 2048 *
is equivalent to
du -Lbs * | fit - --di --di-b -t 700m -B 2048
except for the symbolic link dereference.
- Resources: GNU du manpage
License
GAFFitter is licensed under the GNU General Public License (GPL) Version 3 (or later), June 2007
Related software
- Scdbackup (C)
- Burn to the Brim (Delphi/Kylix, Win32 only)
- Split2cds (C, non-commercial usage)
- Disk Optimizer (C)
- Combine-CD (Pike)
- File Fitter (KDE) (C++/Qt)
- Genetic monodimensional packing (o-paque) (C++)
- Dirsplit (Perl)
- Sync2CD (Python)