ecm
ecm

Reputation: 2755

How does 7-Zip find .7z format archive at offset in a file?

I found that I can sandwich .7z archive files between other, binary data and 7za is still able to list and (presumably) extract files from this archive. I think this is useful eg to create a quintuple-format file such as a DOS kernel, DOS application, DOS device driver all at the beginning, .7z (or .zip) archive in the middle, and an appended Extension for lDebug (ELD) at the end.

I'd like to find the exact conditions for finding the archive but this seems to not be well documented anywhere. For instance, https://py7zr.readthedocs.io/en/latest/archive_format.html lists:

Signature

The first six bytes of a 7-zip file SHALL always contain b'7z\\xbc\\xaf\\x27\\x1c'.

This may describe the format but my test indicates the archive/"file" is indeed found when sandwiched as well, so that the signature doesn't appear at the very beginning of the actual file.

It is similarly not specified how to search for the signature in this file of the 7-Zip sources: https://github.com/ip7z/7zip/blob/e5431fa6f5505e385c6f9367260717e9c47dc2ee/DOC/7zFormat.txt#L168

7z format headers

SignatureHeader

BYTE kSignature[6] = {'7', 'z', 0xBC, 0xAF, 0x27, 0x1C};

I looked some at the sources but was unable to find the exact handling for this. Bits that seem relevant:

https://github.com/ip7z/7zip/blob/e5431fa6f5505e385c6f9367260717e9c47dc2ee/CPP/7zip/Archive/7z/7zRegister.cpp#L18

static Byte k_Signature_Dec[kSignatureSize] = {'7' + 1, 'z', 0xBC, 0xAF, 0x27, 0x1C};

REGISTER_ARC_IO_DECREMENT_SIG(
  "7z", "7z", NULL, 7,
  k_Signature_Dec,
  0,
    NArcInfoFlags::kFindSignature

The kFindSignature flag seems to be read in https://github.com/ip7z/7zip/blob/e5431fa6f5505e385c6f9367260717e9c47dc2ee/CPP/7zip/UI/Common/LoadCodecs.h#L145

bool Flags_FindSignature() const { return (Flags & NArcInfoFlags::kFindSignature) != 0; }

Which is used in https://github.com/ip7z/7zip/blob/e5431fa6f5505e385c6f9367260717e9c47dc2ee/CPP/7zip/UI/Common/OpenArchive.cpp#L2115

    {
      const CArcInfoEx &ai = op.codecs->Formats[(unsigned)formatIndex];
      if (ai.FindExtension(extension) >= 0)
      {
        if (ai.Flags_FindSignature() && searchMarkerInHandler)
          return S_FALSE;
      }
    }

I don't understand how this implements a search for the signature however.


Here's a test case using 7za v16.02 on a Linux host's command line:

$ dd if=/dev/random of=prefix bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000207353 s, 49.4 MB/s
$ dd if=/dev/random of=suffix bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000172483 s, 59.4 MB/s
$ touch foo
$ 7za -t7z a foo.7z foo

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs AMD EPYC-Rome Processor (830F10),ASM,AES-NI)

Scanning the drive:
1 file, 0 bytes

Creating archive: foo.7z

Items to compress: 1


Files read from disk: 0
Archive size: 82 bytes (1 KiB)
Everything is Ok
$ cat prefix foo.7z suffix > middle.bin
$ 7za l middle.bin

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs AMD EPYC-Rome Processor (830F10),ASM,AES-NI)

Scanning the drive for archives:
1 file, 20562 bytes (21 KiB)

Listing archive: middle.bin

--
Path = middle.bin
Type = 7z
WARNINGS:
There are data after the end of archive
Offset = 10240
Physical Size = 82
Tail Size = 10240
Headers Size = 82
Solid = -
Blocks = 0

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2024-12-25 21:26:30 ....A            0            0  foo
------------------- ----- ------------ ------------  ------------------------
2024-12-25 21:26:30                  0            0  1 files

Warnings: 1
$

Question: What are the exact conditions for finding the .7z archive in the middle of a file?

Upvotes: 0

Views: 36

Answers (0)

Related Questions