Menu

Home Page

Project Page (SF.net)

Download

Latest version (1.0.0)

Git Repository

Official Debian packages

Documentation

Class Hierarchy

Author

Douglas A. Augusto daaugusto at gmail.com




Download Genetic Algorithm File Fitter

Last updated on Mon Aug 3 14:59:03 2015.

About

Genetic Algorithm File Fitter, or just GAFFitter, is a command-line software written in C++ that arranges (via a genetic algorithm) an input list of items or files/directories into volumes of a certain capacity (target), such as CD or DVD, in a way that the total wastage is minimized. By smartly arranging the input list, GAFFitter fits better the given items and so optimizes (reduces) the number of required volumes to pack them.

Currently, GAFFitter runs on GNU/Linux and other POSIX systems, but it is designed in such manner that should be easily extended to non-POSIX operating environment.

Features

There are five key features behind GAFFitter, namely:

Usage

Usage: fit -t target[unit] [options...] <files>
       ... | fit - -t target[unit] [options...] [files]

  the unit suffixes 'k', 'm', 'g' or 't' can be used, where:
     k = KB/KiB, m = MB/MiB, g = GB/GiB and t = TB/TiB [default = bytes]
General options:
  -t <f>[unit], --target <f>[unit]
     target size (mandatory), f>0.0
  --si
     use powers of 1000 (not 1024) for target, min, max and output sizes
  --bins <n>, --vols <n>
     maximum number of bins (volumes) [default = "unlimited"]
  -v, --verbose
     verbose
  --min <f>[unit], --min-size <f>[unit]
     minimum file size [default = none]
  --max <f>[unit], --max-size <f>[unit]
     maximum file size [default = none]
  -B <n>, --block-size <n>
     the smallest amount of bytes a file can occupy [default = 1]
  --ss, --show-size
     print the size of each file
  --sb, --show-bytes
     also print the sizes in bytes
  --hi, --hide-items
     don't print the selected items
  --hs, --hide-summary
     hide summary line containing sum, difference and number of
     selected items
  -s, --sort-by-size
     sort the output by size, not by name
  -n, --no-case
     use case-insensitive sorting
  -r, --sort-reverse
     sort the output in reverse order
  -z, --null-data
     assume NULL (\0) as the delimiter of input files via stdin (pipe)
  -Z, --null
     same as --dw '\0'. See also the -0 and --hs options
  -0, --null-bins
     same as --bs '\0'. See also the -Z and --hs options
  --bs <char>, --bins-separator <char>
     separate bins (vols) with "char" [default = newline]
  --ew <char>, --enclose-with <char>
     enclose file names with "char" [default = none]
  --dw <char>, --delimit-with <char>
     delimit file names (lines) with "char" [default = newline]
  -1 (or --fast) ... -9 (or --best)
     select preset search parameters [default = -3]
  --version
     print GAFFitter version and exit
  -h, --help
     print this help and exit
Direct Input options:
  --di, --direct-input
     switch to direct input mode, i.e., read directly "size identifier"
     pairs instead of file names
  --di-b, --di-bytes
     assume input sizes as bytes
  --di-k, --di-kb
     assume input sizes as kibi bytes (KiB); KB if --di-si
  --di-m, --di-mb
     assume input sizes as mebi bytes (MiB); MB if --di-si
  --di-g, --di-gb
     assume input sizes as gibi bytes (GiB); GB if --di-si
  --di-t, --di-tb
     assume input sizes as tebi bytes (TiB); TB if --di-si
  --di-si
     use powers of 1000 (not 1024) for input sizes
Genetic Algorithm options:
  --ga-s <n>, --ga-seed <n>
     GA initialization seed, n>=0 [default = 1]; 0 = random
  --ga-rs, --ga-random-seed
     use random GA seed (same as --ga-seed 0)
  --ga-ng <n>, --ga-num-generations <n>
     maximum number of generations, n>0 [default = auto]
  --ga-ps <n>, --ga-pop-size <n>
     number of individuals, n>tournament_size [default = auto]
  --ga-cp <f>, --ga-cross-prob <f>
     crossover probability, 0.0<=f<=1.0 [default = 0.75]
  --ga-mp <f>, --ga-mutation-prob <f>
     mutation probability, 0.0<=f<=1.0 [default = 0.30]
  --ga-sp <n>, --ga-sel-pressure <n>
     selection pressure (tournament size), 2<=n<pop_size [default = 2]
  --ga-theo [n], --ga-theoretical [n]
     stop if the theoretical minimum number of bins is reached. If n is
     given, it is assumed to be the theoretical minimum number of bins.
Other search methods
  --ap, --approximate
     local approximation using Best Fit search (non-optimal but
     very fast)
  --sp, --split
     just split the input when target size is reached (preserves
     original order while splitting)

Examples

Simple usage

~$ fit -t 700m *.ogg
brahms_4_balladen_op_10_3.ogg
iii_sarka.ogg
...
quatuor_gdur_grave.ogg

[1] 700.00MiB/700.00MiB of 4.55GiB, Diff: 1Bytes, Items: 296/861

francesca_da_rimini_op_32.ogg
sinf_en_re_menor_lento.ogg
...
zapateado_allegro_vivace.ogg

[2] 700.00MiB/700.00MiB of 4.55GiB, Diff: 59Bytes, Items: 43/565

.
.
.

Creating a maximum amount of volumes (--bins/--vols option)

~$ fit -t 4.37g --bins 1 *
beethoven_melos_quartett
beethoven_ludwig_van
bruch_dvorak
... 
richard_wagner
wolf_strauss

[1] 4.37GiB/4.37GiB of 21.57GiB, Diff: 1.11KiB, Items: 14/68

Input via stdin (pipes)

~$ find . -type f | fit - -t 1.4m
./fit
./optimizers/GeneticAlgorithm.o
./Input.o
...
./util/CVS/Repository

[1] 1.40MiB/1.40MiB of 1.49MiB, Diff: 0Bytes, Items: 20/39
...

Using the --split option

This option is useful when you need to preserve the order of the given files/items, thus just splitting them accordingly to the target size. This method usually wastes more space, though.

Suppose you have a few music files and you want to generate two volumes of 10MiB each. However, preserving the input order is important for you:

~$ fit -t 10m --split *
01_track1.ogg   3.73MiB
02_track2.ogg   5.42MiB

[1] 9.15MiB/10.00MiB of 77.28MiB, Diff: 870.00KiB, Items: 2/18

03_track3.ogg   3.37MiB
04_track4.ogg   3.69MiB

[2] 7.06MiB/10.00MiB of 77.28MiB, Diff: 2.94MiB, Items: 2/16
...

Of course, without this ordering restriction the volumes are better explored:

~$ fit -t 10m *
03_track3.ogg   3.37MiB
05_track5.ogg   3.49MiB
11_track11.ogg  3.14MiB

[1] 10.00MiB/10.00MiB of 77.28MiB, Diff: 2.00KiB, Items: 3/18

02_track2.ogg   5.42MiB
13_track13.ogg  4.54MiB

[2] 9.96MiB/10.00MiB of 77.28MiB, Diff: 38.00KiB, Items: 2/15
...

More features

~$ fit -t 600k --show-size src/* 
src/optimizers  447.38KiB
src/DiskUsage.o 104.19KiB
src/util        32.42KiB
src/Input.cc    4.10KiB
src/Optimizer.hh        3.40KiB
src/Input.hh    3.20KiB
src/Params.hh   3.17KiB
src/DiskUsage.hh        2.15KiB

[1] 600.00KiB/600.00KiB of 1.50MiB, Diff: 1Bytes, Items: 8/19
...

Direct Input

~$ fit -t 3.14 --di --ss '1 id one' '2.4 ID2' '0.3 b' '0.5 foo' '1.23456789 bar'
b       0.3
bar     1.23456789
foo     0.5
id one  1

[1] 3.03456789/3.14 of 5.43456789, Diff: 0.10543211, Items: 4/5

ID2     2.4

[2] 2.4/3.14 of 5.43456789, Diff: 0.74, Items: 1/1
~$ du * | fit - -t 200 --di
optimizers
util/CVS
Makefile
Exception.hh

[1] 200/200 of 536, Diff: 0, Items: 4/22
...
~$ du * | fit - -t 200k --di --di-k
optimizers
util/CVS
Makefile
Exception.hh

[1] 200.00KiB/200.00KiB of 536.00KiB, Diff: 0Bytes, Items: 4/22
...

Or if you prefer, some screenshots can be found here. :)

GAFFitter's scripts

The official GAFFitter scripts can be downloaded here.

Creation of ISO9660 image files

Being a filter, GAFFitter can be used for many tasks involving the packing of files and directories. For instance, the Python script gaff-iso creates CD/DVD ISO9660 images using genisoimage or mkisofs. The syntax is as follows:

Usage: gaff-iso --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs

Options:
  --dvd
     create volumes of 4.38GB (DVD)
  --dvd-dl
     create volumes of 7.95GB (Dual Layer DVD)
  --cd
     create volumes of 702MB (80min CD)
  --cd74
     create volumes of 650MB (74min CD)
  --cd90
     create volumes of 790MB (90min CD)
  --cd99
     create volumes of 870MB (99min CD)
  --size n[k,m,g,t]
     create custom volumes of n KiB/MiB/GiB/TiB each
  --split
     just splits the input (i.e. preserves original order)
  --vols n
     maximum number of volumes (default = as much as possible)
  -o dir, --output dir
     output directory for the ISO images
  -v
     verbose mode
  -y
     overwrite any existing output files

Advanced options:
  --mkisofs-opts "opts"
     use custom mkisofs options (default = "-D -r -J -joliet-long")

Example: gaff-iso --dvd *

The generated image files will be stored in the current directory (or in the directory specified by the user via -o option) and they will be named as CD-0001.iso, ..., CD-000n.iso and DVD-0001.iso, ..., DVD-000n.iso for CD and DVD images respectively. n is the number of volumes (bins) used to pack the given list of files/dirs.

K3B and GAFFitter

Similar to gaff-iso, the usage information for gaff-k3b is:

Usage: gaff-k3b --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs

Options:
  --dvd
     create volumes of 4.38GB (DVD)
  --dvd-dl
     create volumes of 7.95GB (Dual Layer DVD)
  --cd
     create volumes of 702MB (80min CD)
  --cd74
     create volumes of 650MB (74min CD)
  --cd90
     create volumes of 790MB (90min CD)
  --cd99
     create volumes of 870MB (99min CD)
  --size n[k,m,g,t]
     create custom volumes of n KiB/MiB/GiB/TiB each
  --split
     just splits the input (i.e. preserves original order)
  --vols n
     maximum number of volumes (default = as much as possible)
  -v
     verbose mode

Example: gaff-k3b --dvd *

Nautilus Scripts for gaff-k3b

For those who use Nautilus, the following scripts may be useful:

nautilus-gaff-k3b-cd
Packs files/dirs into CD volumes and burn them using K3B
nautilus-gaff-k3b-cd-split
Packs files/dirs into CD volumes and burn them using K3B, but preserves the order
nautilus-gaff-k3b-dvd
Packs files/dirs into DVD volumes and burn them using K3B
nautilus-gaff-k3b-dvd-split
Packs files/dirs into DVD volumes and burn them using K3B, but preserves the order

Installation:

Usage:

  1. Under Nautilus, select the files/dirs to be packed
  2. Right-click the selection and go to the entry Scripts on the context menu
  3. Select one of the above mentioned scripts

Note: those scripts require gaff-k3b (see gaffitter.sf.net), awk, sed and echo

Resources: K3B website, Nautilus File Manager Scripts

Brasero and GAFFitter

gaff-brasero is a Python script that integrates GAFFitter with Brasero CD/DVD burner. (Thanks to Mark Edgington)

Usage: gaff-brasero --{dvd|dvd-dl|cd|cd74|cd90|cd99|size n} [OPTION] files/dirs

Options:
  --dvd
     create volumes of 4.38GB (DVD)
  --dvd-dl
     create volumes of 7.95GB (Dual Layer DVD)
  --cd
     create volumes of 702MB (80min CD)
  --cd74
     create volumes of 650MB (74min CD)
  --cd90
     create volumes of 790MB (90min CD)
  --cd99
     create volumes of 870MB (99min CD)
  --size n[k,m,g,t]
     create custom volumes of n KiB/MiB/GiB/TiB each
  --split
     just splits the input (i.e. preserves original order)
  --vols n
     maximum number of volumes (default = as much as possible)
  -v
     verbose mode

Example: gaff-brasero --dvd *

Note: Unlike K3B, Brasero follows symbolic lynks automatically; at present Brasero doesn't provide means to disable such behaviour. Also, Brasero can only manage one volume at time, so on multiple volumes output Brasero will be called sequentially multiple times.

GAFFitter as a filter

Following symbolic links using GNU Disk Usage (du)

GAFFitter doesn't follow symbolic links, however, this can be fully achieved by using du for getting the file/dir sizes:

du -Lbs <files/dirs> | fit - --di --di-b <user options>

For example

fit -t 700m -B 2048 *

is equivalent to

du -Lbs * | fit - --di --di-b -t 700m -B 2048

except for the symbolic link dereference.

License

GAFFitter is licensed under the GNU General Public License (GPL) Version 3 (or later), June 2007

Related software