tylerl
tylerl

Reputation: 30867

Archival filesystem or format

I'm looking for a file type for storing archives of systems that have been decomissioned. At the moment, we primarily use tar.gz, but finding and extracting just a few files from a 200GB tar.gz archive is unwieldy, since tar.gz doesn't support any sort of random-access read provision. (And before you get the idea, mounting a tgz using FUSE doen't make it better.)

Here's what we've found so far -- I'd like to know what other options there are:

I'm trying to think of a simple way of creating a full-featured filesystem image into as small a space as possible -- ext2 in a cloop image, but it doesn't seem like a particularly user-friendly solution.

Presumably this problem has been solved before -- are there any options I've missed?

Upvotes: 8

Views: 2740

Answers (5)

Brian Minton
Brian Minton

Reputation: 3777

The dar (disk archiver) program is an open source program that supports compression (on a per-file basis), and includes an index for fast seeking to a specific file. It is widely available on a variety of systems. From the FAQ, xattrs and hard links are supported.

Many backup/copy tools do not take care of hard linked inode (hard linked plain files, named pipes, char devices, block devices, symlinks)... dar does, Many backup/copy tools do not take care of sparse files... dar does, Many backup/copy tools do not take care of Extended Attributes... dar does, Many backup/copy tools do not take care of Posix ACL (Linux)... dar does, Many backup/copy tools do not take care of file forks (MacOS X)... dar does, Many backup/copy tools do not take any precautions while working on a live system... dar does.

Upvotes: 2

Gabriel
Gabriel

Reputation: 1322

virt-sparsify can be used to sparsify and (through qemu's qcow2 gzip support) compress almost any linux filesystem or disk image. The resulting images can be mounted in a VM, or on the host through guestmount.

There's a new ndbkit xz plugin that can be used for higher compression, which still keeps good random-access performance (as long as you ask xz/pixz to reset compression on block boundaries).

Upvotes: 2

Abe Voelker
Abe Voelker

Reputation: 31574

ZFS has pretty decent compression capabilities, if memory serves. That said, I've never actually used it. :-)

Upvotes: 1

Phillip Lougher
Phillip Lougher

Reputation: 91

Mksquashfs is a highly parallelised program, and makes use of all available cores to maximise performance. If you're seeing very large build times then you either have a lot of duplicate files, or the machine is running short of memory and thrashing.

To investigate performance, you can firstly

Use -no-duplicates option on Mkssquashfs i,e.

mksquashfs xxx xxx.sqsh -no-duplicates

Duplicate checking is a slow operation and it has to be done sequentially, and on file sets with a lot of duplicates this becomes a bottleneck on an otherwise parallelised program.

Check memory usage/free memory while Mksquashfs is running, if the system is trashing, very low performance will occur. Investigate the -read-queue, -write-queue and -fragment-queue options to control how much data Mksquashfs caches at run-time.

Tar and zip are not parallelised and use only one core, and so it is difficult to believe your complaint about Mksquashfs compression performance.

Also I have never seen any other reports that the userspace programs are "poor", Mksquashfs and Unsquashfs have an advanced set of options which allow very fine control over the compression process, and to allow users to select which files are compressed - and these options are considerably in advance of programs like tar.

Unless you can give concrete examples of why the tools are poor, I will put this down to the usual case of the workman blaming the tools, whereas the real problem is elsewhere.

As I said previously, your system is probably thrashing and hence performing badly. By default Mksquashfs uses all available cores, and a minimum of 600 Mbytes of RAM (rising to 2 GBytes or more on large filesystems). This is for performance as caching data in memory reduces disk I/O. This "out of the box" behaviour is good for typical users which have large amounts of memory, and an otherwise idle system. This is what the majority of users want, a Mksquashfs which "maxes out" the system to achieve as fast as possible filesystem creation.

It is not good for systems with low RAM, or for systems with active processes consuming a large amount of the available CPU, and/or memory. You will simply get resource contention as each process contends for the available CPU and RAM. This is not a fault of Mksquashfs, but of the user.

The Mksquashfs -processor option is there to limit the number of processors Mksquashfs uses, the -read-queue, -write-queue and -fragment-queue options are there to control how much RAM is used by Mksquashfs.

Upvotes: 9

As this is Stack Overflow, I assume you are looking for library/code. I think you can check our SolFS virtual file system then. It doesn't support hardlinks, but alternate streams are supported (for xattr) and tags are supported (for unix attributes). Next, symlinks are supported you can convert hardlinks to symlinks when performing the archive.

Upvotes: -1

Related Questions