Tobias Knauss
Tobias Knauss

Reputation: 3509

SevenZipSharp fails to unpack certain tar archives

I use SevenZipSharp for packing to 7z archives and unpacking from all kinds of archives. It has worked really well for years.

Today I had a .tgz archive that failed at unpacking in the 2nd stage:
Extracting .tar from .tgz worked, but unpacking .tar failed. It is just this single archive, which is affected. All other .tgz work well. The .tar itself is not faulty, because unpacking with 7-zip software works, too.

Upvotes: 3

Views: 507

Answers (1)

Tobias Knauss
Tobias Knauss

Reputation: 3509

After a lot of tests me and a colleague found the reason:
We had to debug the SevenZipSharp DLL to find the failure in it. The DLL detects the type of the archive by reading the first 16 bytes and comparing it to a list of signatures. This is correct for most types of archives, but wrong for .tar archives, because the .tar file header starts with the archive's filename: TAR @ Wikipedia. The signature "ustar", if existing, is located at address 257 (0x0101).

SevenZipSharp knows that and checks for "ustar" at this address, but only if the previous detection has failed. Unfortunately, our TAR archive's name was "x42202.tar". And the header of .dmg files (Apple Disk Image) consists of a single "x" (how stupid is that, to use only one byte as a signature??). So there actually was a successful detection of the file type, just the detection result was wrong.
(I know, the linked Wikipedia says, the .dmg header signature was "koly", but I confirmed with a downloaded .dmg file that I found in the internet.)

EDIT 07.12.2021: The signature actually is "koly", but the so-called header is 512 bytes long and located at the END of the file. SevenZipSharp however checks for a signature at the beginning. Most files (but not all!) that I tested indeed have an "x" at the beginning, but I cannot tell why. Maybe it is an unofficial kind of header ("x" seems to come from the MIME type "x-apple-diskimage"). - End of EDIT.

Therefore we modified the code in FileSignatureChecker.cs to avoid false archive type detection of .tar archives.
Below you find the original and the modified code.
Code base is the latest SevenZipSharp version that can be found in the CodePlex archive. Obviously it's not under active development any more, because the version number hasn't changed in years, and if it was still active it would have moved after CodePlex has retired.

Update 2018-11-16
bugfix in modified code: enSpecialFormat was not returned if found.

Update 2021-12-16
The bug is still present in the github repository https://github.com/squid-box/SevenZipSharp, which is the current location of the SevenZipSharp project. A pull request with a major rework of the faulty code was uploaded and is waiting for merge.

original code

public static InArchiveFormat CheckSignature (Stream stream, out int offset, out bool isExecutable)
{
  offset = 0;
  if (!stream.CanRead)
  {
    throw new ArgumentException ("The stream must be readable.");
  }
  if (stream.Length < SIGNATURE_SIZE)
  {
    throw new ArgumentException ("The stream is invalid.");
  }

  #region Get file signature

  var signature = new byte[SIGNATURE_SIZE];
  int bytesRequired = SIGNATURE_SIZE;
  int index = 0;
  stream.Seek (0, SeekOrigin.Begin);
  while (bytesRequired > 0)
  {
    int bytesRead = stream.Read (signature, index, bytesRequired);
    bytesRequired -= bytesRead;
    index += bytesRead;
  }
  string actualSignature = BitConverter.ToString (signature);

  #endregion

  InArchiveFormat suspectedFormat = InArchiveFormat.XZ; // any except PE and Cab
  isExecutable = false;

  foreach (string expectedSignature in Formats.InSignatureFormats.Keys)
  {
    if (actualSignature.StartsWith (expectedSignature, StringComparison.OrdinalIgnoreCase) ||
        actualSignature.Substring (6).StartsWith (expectedSignature, StringComparison.OrdinalIgnoreCase) &&
        Formats.InSignatureFormats[expectedSignature] == InArchiveFormat.Lzh)
    {
      if (Formats.InSignatureFormats[expectedSignature] == InArchiveFormat.PE)
      {
        suspectedFormat = InArchiveFormat.PE;
        isExecutable = true;
      }
      else
      {
        return Formats.InSignatureFormats[expectedSignature];
      }
    }
  }

  // Many Microsoft formats
  if (actualSignature.StartsWith ("D0-CF-11-E0-A1-B1-1A-E1", StringComparison.OrdinalIgnoreCase))
  {
    suspectedFormat = InArchiveFormat.Cab; // != InArchiveFormat.XZ
  }

  #region SpecialDetect
  try
  {
    SpecialDetect (stream, 257, InArchiveFormat.Tar);
  }
  catch (ArgumentException) { }
  if (SpecialDetect (stream, 0x8001, InArchiveFormat.Iso))
  {
    return InArchiveFormat.Iso;
  }
  if (SpecialDetect (stream, 0x8801, InArchiveFormat.Iso))
  {
    return InArchiveFormat.Iso;
  }
  if (SpecialDetect (stream, 0x9001, InArchiveFormat.Iso))
  {
    return InArchiveFormat.Iso;
  }
  if (SpecialDetect (stream, 0x9001, InArchiveFormat.Iso))
  {
    return InArchiveFormat.Iso;
  }
  if (SpecialDetect (stream, 0x400, InArchiveFormat.Hfs))
  {
    return InArchiveFormat.Hfs;
  }
  #region Last resort for tar - can mistake
  if (stream.Length >= 1024)
  {
    stream.Seek (-1024, SeekOrigin.End);
    byte[] buf = new byte[1024];
    stream.Read (buf, 0, 1024);
    bool istar = true;
    for (int i = 0; i < 1024; i++)
    {
      istar = istar && buf[i] == 0;
    }
    if (istar)
    {
      return InArchiveFormat.Tar;
    }
  }
  #endregion
  #endregion

  #region Check if it is an SFX archive or a file with an embedded archive.
  if (suspectedFormat != InArchiveFormat.XZ)
  {
    #region Get first Min(stream.Length, SFX_SCAN_LENGTH) bytes
    var scanLength = Math.Min (stream.Length, SFX_SCAN_LENGTH);
    signature = new byte[scanLength];
    bytesRequired = (int)scanLength;
    index = 0;
    stream.Seek (0, SeekOrigin.Begin);
    while (bytesRequired > 0)
    {
      int bytesRead = stream.Read (signature, index, bytesRequired);
      bytesRequired -= bytesRead;
      index += bytesRead;
    }
    actualSignature = BitConverter.ToString (signature);
    #endregion

    foreach (var format in new InArchiveFormat[]
    {
                    InArchiveFormat.Zip,
                    InArchiveFormat.SevenZip,
                    InArchiveFormat.Rar,
                    InArchiveFormat.Cab,
                    InArchiveFormat.Arj
    })
    {
      int pos = actualSignature.IndexOf (Formats.InSignatureFormatsReversed[format]);
      if (pos > -1)
      {
        offset = pos / 3;
        return format;
      }
    }
    // Nothing
    if (suspectedFormat == InArchiveFormat.PE)
    {
      return InArchiveFormat.PE;
    }
  }
  #endregion

  throw new ArgumentException ("The stream is invalid or no corresponding signature was found.");
}

modified code

public static InArchiveFormat CheckSignature (Stream stream, out int offset, out bool isExecutable)
{
  offset = 0;
  if (!stream.CanRead)
  {
    throw new ArgumentException ("The stream must be readable.");
  }
  if (stream.Length < SIGNATURE_SIZE)
  {
    throw new ArgumentException ("The stream is invalid.");
  }

  #region Get file signature

  var signature = new byte[SIGNATURE_SIZE];
  int bytesRequired = SIGNATURE_SIZE;
  int index = 0;
  stream.Seek (0, SeekOrigin.Begin);
  while (bytesRequired > 0)
  {
    int bytesRead = stream.Read (signature, index, bytesRequired);
    bytesRequired -= bytesRead;
    index += bytesRead;
  }
  string actualSignature = BitConverter.ToString (signature);

  #endregion Get file signature

  InArchiveFormat suspectedFormat = InArchiveFormat.XZ; // any except PE and Cab
  isExecutable = false;

  InArchiveFormat enDetectedFormat = (InArchiveFormat)(-1);
  InArchiveFormat enSpecialFormat = (InArchiveFormat)(-1);

  foreach (string expectedSignature in Formats.InSignatureFormats.Keys)
  {
    if (actualSignature.StartsWith (expectedSignature, StringComparison.OrdinalIgnoreCase) ||
        actualSignature.Substring (6).StartsWith (expectedSignature, StringComparison.OrdinalIgnoreCase) &&
        Formats.InSignatureFormats[expectedSignature] == InArchiveFormat.Lzh)
    {
      if (Formats.InSignatureFormats[expectedSignature] == InArchiveFormat.PE)
      {
        suspectedFormat = InArchiveFormat.PE;
        isExecutable = true;
      }
      else
      {
        enDetectedFormat = Formats.InSignatureFormats[expectedSignature];
        break;
      }
    }
  }

  // Many Microsoft formats
  if (actualSignature.StartsWith ("D0-CF-11-E0-A1-B1-1A-E1", StringComparison.OrdinalIgnoreCase))
  {
    suspectedFormat = InArchiveFormat.Cab; // != InArchiveFormat.XZ
  }

  #region SpecialDetect

  if (SpecialDetect (stream, 257, InArchiveFormat.Tar))
  {
    enSpecialFormat = InArchiveFormat.Tar;
  }
  else if (SpecialDetect (stream, 0x8001, InArchiveFormat.Iso))
  {
    enSpecialFormat = InArchiveFormat.Iso;
  }
  else if (SpecialDetect (stream, 0x8801, InArchiveFormat.Iso))
  {
    enSpecialFormat = InArchiveFormat.Iso;
  }
  else if (SpecialDetect (stream, 0x9001, InArchiveFormat.Iso))
  {
    enSpecialFormat = InArchiveFormat.Iso;
  }
  else if (SpecialDetect (stream, 0x9001, InArchiveFormat.Iso))
  {
    enSpecialFormat = InArchiveFormat.Iso;
  }
  else if (SpecialDetect (stream, 0x400, InArchiveFormat.Hfs))
  {
    enSpecialFormat = InArchiveFormat.Hfs;
  }

  #region Last resort for tar - can mistake

  bool bPossiblyTAR = false;
  if (stream.Length >= 1024)
  {
    stream.Seek (-1024, SeekOrigin.End);
    byte[] buf = new byte[1024];
    stream.Read (buf, 0, 1024);
    bPossiblyTAR = true;
    for (int i = 0; i < 1024; i++)
    {
      bPossiblyTAR = bPossiblyTAR && buf[i] == 0;
    }
  }

  // TAR header starts with the filename of the archive.
  // The filename can be anything, including the Identifiers of the various archive formats.
  // This means that a TAR can be misinterpreted as any type of archive.
  if (enSpecialFormat == InArchiveFormat.Tar
  || bPossiblyTAR)
  {
    var fs = stream as FileStream;
    if (fs != null)
    {
      string sStreamFilename = fs.Name;
      if (sStreamFilename.EndsWith (".tar", StringComparison.InvariantCultureIgnoreCase))
        enDetectedFormat = InArchiveFormat.Tar;
    }
  }

  #endregion Last resort for tar - can mistake

  if (enDetectedFormat != (InArchiveFormat)(-1))
    return enDetectedFormat;
  if (enSpecialFormat != (InArchiveFormat)(-1))
    return enSpecialFormat;

  #endregion SpecialDetect

  #region Check if it is an SFX archive or a file with an embedded archive.

  if (suspectedFormat != InArchiveFormat.XZ)
  {
    #region Get first Min(stream.Length, SFX_SCAN_LENGTH) bytes

    var scanLength = Math.Min (stream.Length, SFX_SCAN_LENGTH);
    signature = new byte[scanLength];
    bytesRequired = (int)scanLength;
    index = 0;
    stream.Seek (0, SeekOrigin.Begin);
    while (bytesRequired > 0)
    {
      int bytesRead = stream.Read (signature, index, bytesRequired);
      bytesRequired -= bytesRead;
      index += bytesRead;
    }
    actualSignature = BitConverter.ToString (signature);

    #endregion Get first Min(stream.Length, SFX_SCAN_LENGTH) bytes

    foreach (var format in new InArchiveFormat[]
    {
                InArchiveFormat.Zip,
                InArchiveFormat.SevenZip,
                InArchiveFormat.Rar,
                InArchiveFormat.Cab,
                InArchiveFormat.Arj
    })
    {
      int pos = actualSignature.IndexOf (Formats.InSignatureFormatsReversed[format]);
      if (pos > -1)
      {
        offset = pos / 3;
        return format;
      }
    }
    // Nothing
    if (suspectedFormat == InArchiveFormat.PE)
    {
      return InArchiveFormat.PE;
    }
  }

  #endregion Check if it is an SFX archive or a file with an embedded archive.

  throw new ArgumentException ("The stream is invalid or no corresponding signature was found.");
}

Upvotes: 6

Related Questions