Reputation: 31
I am using System.IO.Compression to extract the content of some Zip files. The problem is that whenever there is an entry with a filename that contains some Windows' illegal characters then an exception is thrown. I have tried several things but I still didn't find any way to disregard the bad entries and extract those that are good. Please, consider that modifying the content of the zip file is not a possibility for the type of processing we are performing, so I must process the file as it is.
The system usually processes files with several entries, this number is variable, but it could be up to 300 entries in one zip file, and occasionally there will be an entry with a filename such as 'myfile<name>.txt'
, which contains angle brackets that are clearly illegal characters for Windows. I really want to disregard this entry and move on to extract the rest of the entries within the ZipArchive. But it looks that this is not possible.
Any idea on how to disregard the bad entries of a ZipArchive?
So far I have tried different things to get the entries separately, but I am always getting the exact same exception error.
Here are some of the things I have tried so far:
Implementing the regular way to iterate over the entries:
foreach (ZipArchiveEntry entry in ZipArchive.Entries)
Trying to get only one entry by index (same exception here even though the first entry is a valid one):
ZipArchiveEntry entry = ZipArchive.Entries[0]
Applying a filter using a lambda expression to disregard the invalid entries (same exception also):
var entries = zipArchive.Entries.Where(a =>
a.FullName.IndexOfAny(Path.GetInvalidFileNameChars() ) == -1);
Nothing of this helps and the exception I get every single time is as follows:
at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional) at System.IO.Path.GetFileName(String path) at System.IO.Compression.ZipHelper.EndsWithDirChar(String test) at System.IO.Compression.ZipArchiveEntry.set_FullName(String value) at System.IO.Compression.ZipArchiveEntry..ctor(ZipArchive archive, ZipCentralDirectoryFileHeader cd) at System.IO.Compression.ZipArchive.ReadCentralDirectory() at System.IO.Compression.ZipArchive.get_Entries() at ZipLibraryConsole.MicrosoftExtraction.RecursiveExtract(Stream fileToExtract, Int32 maxDepthLevel, Attachment att) in C:\Users\myUser\Documents\Visual Studio 2015\Projects\ZipLibraryConsole\ZipLibraryConsole\MicrosoftExtraction.cs:line 47
This is a snippet of the implemented code:
var zipArchive = new ZipArchive(fileToExtract, ZipArchiveMode.Read);
try
{
foreach (var zipEntry in zipArchive.Entries) // the exception is thrown here, there is no chance to process valid entries at all
{
// Do something and extract the file
}
catch (ArgumentException exception)
{
Console.WriteLine(
String.Format("Failed to complete the extraction. At least one path contains invalid characters for the Operating System: {0}{1}", att.Name, att.Extention));
}
Upvotes: 3
Views: 4321
Reputation: 906
Using System.Reflection you can at least hide the errors, although you only get entries up to the one with the path containing illegal characters.
Add this class and use archive.GetRawEntries() instead of archive.Entries
public static class ZipArchiveHelper
{
private static FieldInfo _Entries;
private static MethodInfo _EnsureDirRead;
static ZipArchiveHelper()
{
_Entries = typeof(ZipArchive).GetField("_entries", BindingFlags.NonPublic | BindingFlags.Instance);
_EnsureDirRead = typeof(ZipArchive).GetMethod("EnsureCentralDirectoryRead", BindingFlags.NonPublic | BindingFlags.Instance);
}
public static List<ZipArchiveEntry> GetRawEntries(this ZipArchive archive)
{
try { _EnsureDirRead.Invoke(archive, null); } catch { }
return (List<ZipArchiveEntry>)_Entries.GetValue(archive);
}
}
The try-catch is ugly and you could catch a specific exceptions if it bugs you. According to the comments above, this is fixed in .NET Core. (UPDATE: Confirmed this is fixed in .Net Core 3.1, maybe earlier).
Credit for this (partial) fix to https://www.codeproject.com/Tips/1007398/Avoid-Illegal-Characters-in-Path-error-in-ZipArchi and https://gist.github.com/rdavisau/b66df9c99a4b11c5ceff
More pointers on fixing paths with illegal characters (not just zip files) at ZipFile.ExtractToDirectory "Illegal characters in path"
Upvotes: 0