Slow unzipping of text files using c# dotnetzip .NET 4.0

Question

I am making a method to extract information from zipped files. All the zip files will contain just one text file. It is the intend that method should return a string array.

I am using dotnetzip, but i am experiencing a horrable performance. I have tried to benchmark the performance of each step and seems to be performing slowly on all steps.

The c# code is:

        public string[] LoadZipFile(string FileName)
    {
        string[] lines = { };
        int start = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText("Reading " + FileName + "... ");
        try
        {
            int nstart;

            nstart = System.Environment.TickCount;       
            ZipFile zip = ZipFile.Read(FileName);
            this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)
", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            MemoryStream ms = new MemoryStream();
            this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)
", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            zip[0].Extract(ms);
            this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)
", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            string filecontents = string.Empty;
            using (var reader = new StreamReader(ms)) 
            { 
                reader.BaseStream.Seek(0, SeekOrigin.Begin); 
                filecontents = reader.ReadToEnd().ToString(); 
            }
            this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)
", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            lines = filecontents.Replace("
", "
").Split("
".ToCharArray());
            this.richTextBoxLOG.AppendText(String.Format("SplitLines ({0}ms)
", System.Environment.TickCount - nstart));
        }
        catch (IOException ex)
        {
            this.richTextBoxLOG.AppendText(ex.Message+ "
"); 

        }
        int slut = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)
", slut - start)); 
        return (lines);

As an example I get this output:

Reading xxxx.zip... ZipFile (0ms) Memorystream (0ms) Extract (234ms) Read (78ms) SplitLines (187ms) Done (514ms)

A total of 514 ms. When the same operation is performed in python 2.6 using this code:

def ReadZip(File):
z = zipfile.ZipFile(File, "r")
name =z.namelist()[0]
return(z.read(name).split('
'))

It executes in just 89 ms. Any ideas on how to improve performance is very welcome.

user1573820 · Accepted Answer

Thanks for the suggestions. I enden up changing the code in a few ways:

Using a collection.generic to return lines
using streamreader.readline

Removing logging and exception handling did not change performance much. I looked at sharplibs unzip library, but it looked a little more complicated to implement and from what I could read on other post there was maybe a little gain in unzipping. It is now running at around 300ms.

        public List LoadZipFile2(string FileName)
    {
        List lines = new List();
        int start = System.Environment.TickCount;
        string debugtext;
        debugtext = "Reading " + FileName + "... ";
        this.richTextBoxLOG.AppendText(debugtext);

        try
        {
            //int nstart = System.Environment.TickCount;
            ZipFile zip = ZipFile.Read(FileName);
           // this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)
", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            MemoryStream ms = new MemoryStream();
            //this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)
", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            zip[0].Extract(ms);
            zip.Dispose();
            //this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)
", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            using (var reader = new StreamReader(ms))
            {
                reader.BaseStream.Seek(0, SeekOrigin.Begin);
                while (reader.Peek() >= 0)
                {
                    lines.Add(reader.ReadLine());
                }
            }
            ;
            //this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)
", System.Environment.TickCount - nstart));
        }
        catch (IOException ex)
        {
            this.richTextBoxLOG.AppendText(ex.Message + "
");
        }
        int slut = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)
", slut - start));
        return (lines);

Slow unzipping of text files using c# dotnetzip .NET 4.0

Answers (2)

Related Questions