Reputation: 147
I am making a method to extract information from zipped files. All the zip files will contain just one text file. It is the intend that method should return a string array.
I am using dotnetzip, but i am experiencing a horrable performance. I have tried to benchmark the performance of each step and seems to be performing slowly on all steps.
The c# code is:
public string[] LoadZipFile(string FileName)
{
string[] lines = { };
int start = System.Environment.TickCount;
this.richTextBoxLOG.AppendText("Reading " + FileName + "... ");
try
{
int nstart;
nstart = System.Environment.TickCount;
ZipFile zip = ZipFile.Read(FileName);
this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
MemoryStream ms = new MemoryStream();
this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
zip[0].Extract(ms);
this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
string filecontents = string.Empty;
using (var reader = new StreamReader(ms))
{
reader.BaseStream.Seek(0, SeekOrigin.Begin);
filecontents = reader.ReadToEnd().ToString();
}
this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
lines = filecontents.Replace("\r\n", "\n").Split("\n".ToCharArray());
this.richTextBoxLOG.AppendText(String.Format("SplitLines ({0}ms)\n", System.Environment.TickCount - nstart));
}
catch (IOException ex)
{
this.richTextBoxLOG.AppendText(ex.Message+ "\n");
}
int slut = System.Environment.TickCount;
this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
return (lines);
As an example I get this output:
Reading xxxx.zip... ZipFile (0ms) Memorystream (0ms) Extract (234ms) Read (78ms) SplitLines (187ms) Done (514ms)
A total of 514 ms. When the same operation is performed in python 2.6 using this code:
def ReadZip(File):
z = zipfile.ZipFile(File, "r")
name =z.namelist()[0]
return(z.read(name).split('\r\n'))
It executes in just 89 ms. Any ideas on how to improve performance is very welcome.
Upvotes: 0
Views: 1837
Reputation: 147
Thanks for the suggestions. I enden up changing the code in a few ways:
Removing logging and exception handling did not change performance much. I looked at sharplibs unzip library, but it looked a little more complicated to implement and from what I could read on other post there was maybe a little gain in unzipping. It is now running at around 300ms.
public List<string> LoadZipFile2(string FileName)
{
List<string> lines = new List<string>();
int start = System.Environment.TickCount;
string debugtext;
debugtext = "Reading " + FileName + "... ";
this.richTextBoxLOG.AppendText(debugtext);
try
{
//int nstart = System.Environment.TickCount;
ZipFile zip = ZipFile.Read(FileName);
// this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
MemoryStream ms = new MemoryStream();
//this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
zip[0].Extract(ms);
zip.Dispose();
//this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));
//nstart = System.Environment.TickCount;
using (var reader = new StreamReader(ms))
{
reader.BaseStream.Seek(0, SeekOrigin.Begin);
while (reader.Peek() >= 0)
{
lines.Add(reader.ReadLine());
}
}
;
//this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
}
catch (IOException ex)
{
this.richTextBoxLOG.AppendText(ex.Message + "\n");
}
int slut = System.Environment.TickCount;
this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
return (lines);
Upvotes: 1
Reputation: 34198
Your code is not like-for-like, so the comparison is unfair. Some important points:
AppendText
calls will be responsible for some of the extra time.\r\n
instead.StreamReader.ReadLine
than to read the whole stream and then split it manually.In short, you should profile some of the alternative methods, and you should time your code without using the RichTextBox for intermediate logging if you want a true like-for-like comparison.
Upvotes: 1