White Angel
White Angel

Reputation: 1

How to get directory info when directory is in .zip archive?

My path is C:\Users\xx\Desktop\Folder\E_sa_sub.zip\E_sa_sub\subbb
Problem is E_sa_sub.zip

When i tried DirectoryInfo.GetDirectories() and I got the error 'Could not find a part of the path'

    List<DirectoryInfo> arr = new List<DirectoryInfo>();

    private void SubFoldersFiles(string path)
    {
        DirectoryInfo dInfo = new DirectoryInfo(path);
        foreach (DirectoryInfo d in dInfo.GetDirectories())
        {
            SubFoldersFiles(d.FullName);
            arr.Add(d);
        }
    }

Upvotes: 0

Views: 3722

Answers (2)

Peter Duniho
Peter Duniho

Reputation: 70652

A .zip archive file is not actually a Windows file system directory, so what you're trying to do simply won't work. In addition, a .zip archive file doesn't even really have directories in it. What it has are archive entries with names, where these names can include path separator characters (unlike a path on a Windows file system like NTFS or FAT).

When the name has path separator characters in it, archive manipulation tools like Windows' built-in .zip handling features or a program like 7zip treat the entry name as including a full directory path, and use the path separator characters to determine what the virtual path for the entry would be when stored in a real file system.

In the case of Windows Explorer (i.e. the GUI shell for Windows), when you double-click a .zip archive file, it opens it and displays the contents in a window that looks exactly like a regular file system window. But that's not really what it is, and you can't use the usual file system navigation classes like Directory and File to access it.

If you want to treat the archive as if it has directories within it, it's up to you to examine all of the entry names in the archive and build in memory a data structure (e.g. a tree) that represents the directory structure implied by all the entries in the archive. You can do this either by building the entire tree all at once, or you can just look at the first path component of each entry name and treat that as an item in the root of the archive where the file items don't have any path separator character, while directory items have at least one.

By recursively parsing the entry names, you can navigate the .zip archive that way.

Alternatively, just use the ExtractToDirectory() method to temporarily copy the .zip archive contents to some place on the actual file system, and then you can use the normal file system navigation classes there.

Here is an example of an implementation of the second sort I describe above. I.e. given a starting path, it will process the archive entries and identify the individual virtual directory and/or file entries that are effectively in the directory specified by that path:

[Flags]
public enum EntryInfoTypes
{
    Directory = 1,
    File = 2,
    DirectoryOrFile = Directory | File
}

public static class ZipDirectoryExtensions
{
    private static readonly char[] _pathSeparators =
        new[] { Path.AltDirectorySeparatorChar, Path.DirectorySeparatorChar };

    public static IEnumerable<ZipDirectoryInfo> EnumerateDirectories(
        this ZipArchive archive, string path)
    {
        return archive.EnumerateDirectories(path, SearchOption.TopDirectoryOnly);
    }

    public static IEnumerable<ZipDirectoryInfo> EnumerateDirectories(
        this ZipArchive archive, string path, SearchOption searchOption)
    {
        return archive.EnumerateEntryInfos(path, searchOption, EntryInfoTypes.Directory)
            .Cast<ZipDirectoryInfo>();
    }

    public static IEnumerable<ZipFileInfo> EnumerateFiles(
        this ZipArchive archive, string path)
    {
        return archive.EnumerateFiles(path, SearchOption.TopDirectoryOnly);
    }

    public static IEnumerable<ZipFileInfo> EnumerateFiles(
        this ZipArchive archive, string path, SearchOption searchOption)
    {
        return archive.EnumerateEntryInfos(path, searchOption, EntryInfoTypes.File)
            .Cast<ZipFileInfo>();
    }

    public static IEnumerable<ZipEntryInfo> EnumerateEntryInfos(
        this ZipArchive archive, string path, EntryInfoTypes entryInfoTypes)
    {
        return archive.EnumerateEntryInfos(
            path, SearchOption.TopDirectoryOnly, entryInfoTypes);
    }

    public static IEnumerable<ZipEntryInfo> EnumerateEntryInfos(this ZipArchive archive,
        string path, SearchOption searchOption, EntryInfoTypes entryInfoTypes)
    {
        // Normalize input path, by removing any path separator character from the
        // beginning, and ensuring one is present at the end. This will ensure that
        // the path variable format matches the format used in the archive and which
        // is also convenient for the implementation of the algorithm below.
        if (path.Length > 0)
        {
            if (_pathSeparators.Contains(path[0]))
            {
                path = path.Substring(1);
            }

            if (!_pathSeparators.Contains(path[path.Length - 1]))
            {
                path = path + Path.AltDirectorySeparatorChar;
            }
        }

        HashSet<string> found = new HashSet<string>();

        foreach (ZipArchiveEntry entry in archive.Entries)
        {
            if (path.Length > 0 && !entry.FullName.StartsWith(path))
            {
                continue;
            }

            int nextSeparator = entry.FullName.IndexOfAny(_pathSeparators, path.Length);

            if (nextSeparator >= 0)
            {
                string directoryName = entry.FullName.Substring(0, nextSeparator + 1);

                if (found.Add(directoryName))
                {
                    if (entryInfoTypes.HasFlag(EntryInfoTypes.Directory))
                    {
                        yield return new ZipDirectoryInfo(directoryName);
                    }

                    if (searchOption == SearchOption.AllDirectories)
                    {
                        foreach (ZipEntryInfo info in
                            archive.EnumerateEntryInfos(
                                directoryName, searchOption, entryInfoTypes))
                        {
                            yield return info;
                        }
                    }
                }
            }
            else
            {
                if (entryInfoTypes.HasFlag(EntryInfoTypes.File))
                {
                    yield return new ZipFileInfo(entry.FullName);
                }
            }
        }
    }
}

public class ZipEntryInfo
{
    public string Name { get; }

    public ZipEntryInfo(string name)
    {
        Name = name;
    }
}
public class ZipDirectoryInfo : ZipEntryInfo
{
    public ZipDirectoryInfo(string name) : base(name) { }
}

public class ZipFileInfo : ZipEntryInfo
{
    public ZipFileInfo(string name) : base(name) { }
}

Note:

  • It's important to keep in mind that even with this implementation, you are not really dealing with actual directories. This is just a convenience for navigating the archive. In particular, you'll never be able to get anything like the actual DirectoryInfo class from .NET for an archive entry, because what might look like a single directory is really just part of the path for one or more archive entries. Properties like Attributes, CreationTime, and similar just don't make any sense. If you wanted, you could though include things like Name and Parent (renaming my Name property to FullName of course, so it better matches the DirectoryInfo class).
  • I implemented it as extension methods because I find the calling code more readable that way.
  • There are helper types to represent file and directory entries separately. Of course, you could just pass strings back, but this would make it a lot harder to implement the base method that can return both kinds of entries.
  • I included overloads of the different methods to provide a default value for the SearchOption parameter. I guess I could've just set the default value in the method declaration; old habits die hard. :)
  • The above is about as simple an implementation I could come up with, but at a small cost in efficiency. Each level of recursion enumerates the entire collection of archive entries as it searches for the entries in the current path. The efficiency could be significantly improved by maintaining a HashSet<string> of all of the entries that exist, removing an entry when it matches and is effectively consumed. This would significantly complicate the code though, so I leave that as an exercise for the reader.

Upvotes: 2

ekolis
ekolis

Reputation: 6776

You can use a library such as SevenZipSharp to peek into the zip file. This library was designed for the 7z format but it can access a variety of other archive formats as well.

Upvotes: 0

Related Questions