Reputation: 234

Reading text files with 64bit process very slow

I'm merging text files (.itf) with some logic which are located in a folder. When I compile it to 32bit (console application, .Net 4.6) everything works fine except that I get outofmemory exceptions if there are lots of data in the folders. Compiling it to 64bit would solve that problem but it is running super slow compared to the 32bit process (more than 15 times slower).

I tried it with BufferedStream and ReadAllLines, but both are performing very poorly. The profiler tells me that these methods use 99% of the time. I don't know were the problem is...

Here's the code:

private static void readData(Dictionary<string, Topic> topics)
{
    foreach (string file in Directory.EnumerateFiles(Path, "*.itf"))
    {
        Topic currentTopic = null;
        Table currentTable = null;
        Object currentObject = null;
        using (var fs = File.Open(file, FileMode.Open))
        {
            using (var bs = new BufferedStream(fs))
            {
                using (var sr = new StreamReader(bs, Encoding.Default))
                {
                    string line;
                    while ((line = sr.ReadLine()) != null)
                    {
                        if (line.IndexOf("ETOP") > -1)
                        {
                            currentTopic = null;
                        }
                        else if (line.IndexOf("ETAB") > -1)
                        {
                            currentTable = null;
                        }
                        else if (line.IndexOf("ELIN") > -1)
                        {
                            currentObject = null;
                        }
                        else if (line.IndexOf("MTID") > -1)
                        {
                            MTID = line.Replace("MTID ", "");
                        }
                        else if (line.IndexOf("MODL") > -1)
                        {
                            MODL = line.Replace("MODL ", "");
                        }
                        else if (line.IndexOf("TOPI") > -1)
                        {
                            var name = line.Replace("TOPI ", "");
                            if (topics.ContainsKey(name))
                            {
                                currentTopic = topics[name];
                            }
                            else
                            {
                                var topic = new Topic(name);
                                currentTopic = topic;
                                topics.Add(name, topic);
                            }
                        }
                        else if (line.IndexOf("TABL") > -1)
                        {
                            var name = line.Replace("TABL ", "");
                            if (currentTopic.Tables.ContainsKey(name))
                            {
                                currentTable = currentTopic.Tables[name];
                            }
                            else
                            {
                                var table = new Table(name);
                                currentTable = table;
                                currentTopic.Tables.Add(name, table);
                            }
                        }
                        else if (line.IndexOf("OBJE") > -1)
                        {
                            if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0)
                            {
                                var shortLine = line.Replace("OBJE ", "");
                                var obje = new Object(shortLine.Substring(shortLine.IndexOf(" ")));
                                currentObject = obje;
                                currentTable.Objects.Add(obje);
                            }
                        }
                        else if (currentTopic != null && currentTable != null && currentObject != null)
                        {
                            currentObject.Data.Add(line);
                        }
                    }
                }
            }
        }
    }
}

Upvotes: 4

Answers (4)

Chris McKinsey

Reputation: 1

Removing the code optimization checkbox should typically result in performance slowdowns, not speedups. There may be an issue in the VS 2015 product. Please provide a stand-alone repro case with an input set to your program that demonstrate the performance problem and report at: http://connect.microsoft.com/

Upvotes: 0

Hans Passant

Reputation: 942408

The biggest problem with your program is that, when you let it run in 64-bit mode, then it can read a lot more files. Which is nice, a 64-bit process has a thousand times more address space than a 32-bit process, running out of it is excessively unlikely.

But you do not get a thousand times more RAM.

The universal principle of "there is no free lunch" at work. Having enough RAM matters a great deal in a program like this. First and foremost, it is used by the file system cache. That magical operating system feature that makes it look like reading files from a disk is very cheap. It is not at all, one of the slowest things you can do in a program, but it is very good at hiding it. You'll invoke it when you run your program more than once. The second, and subsequent, times you won't read from the disk at all. That's a pretty dangerous feature and very hard to avoid when you test your program, you get very unrealistic assumptions about how efficient it is.

The problem with a 64-bit process is that it easily makes the file system cache ineffective. Since you can read a lot more files, thus overwhelming the cache. And getting old file data removed. Now the second time you run your program it will not be fast anymore. The files you read will not be in the cache anymore but must be read from the disk. You'll now see the real perf of your program, the way it will behave in production. That's a good thing, even though you don't like it very much :)

Secondary problem with RAM is the lesser one, if you allocate a lot of memory to store the file data then you'll force the operating system to find the RAM to store it. That can cause a lot of hard page faults, incurred when it must unmap memory used by another process, or yours, to free up the RAM that you need. A generic problem called "thrashing". Page faults is something you can see in Task Manager, use View > Select Columns to add it.

Given that the file system cache is the most likely source of the slow-down, a simple test you can do is rebooting your machine, which ensures that the cache cannot have any of the file data, then run the 32-bit version. With the prediction that it will also be slow and that BufferedStream and ReadAllLines are the bottlenecks. Like they should be.

One final note, even though your program doesn't match the pattern, you cannot make strong assumptions about .NET 4.6 perf problems yet. Not until this very nasty bug gets fixed.

Upvotes: 4

Chris

Reputation: 234

I could solve it. Seems that there is a bug in .Net compiler. Removing the code optimization checkbox in VS2015 lead to a huge performance increase. Now, it is running with a similar performance as the 32 bit version. My final version with some optimizations:

private static void readData(ref Dictionary<string, Topic> topics)
    {
        Regex rgxOBJE = new Regex("OBJE [0-9]+ ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
        Regex rgxTABL = new Regex("TABL ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
        Regex rgxTOPI = new Regex("TOPI ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
        Regex rgxMTID = new Regex("MTID ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
        Regex rgxMODL = new Regex("MODL ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
        foreach (string file in Directory.EnumerateFiles(Path, "*.itf"))
        {
            if (file.IndexOf("itf_merger_result") == -1)
            {
                Topic currentTopic = null;
                Table currentTable = null;
                Object currentObject = null;
                using (var sr = new StreamReader(file, Encoding.Default))
                {
                    Stopwatch sw = new Stopwatch();
                    sw.Start();
                    Console.WriteLine(file + " read, parsing ...");
                    string line;
                    while ((line = sr.ReadLine()) != null)
                    {
                        if (line.IndexOf("OBJE") > -1)
                        {
                            if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0)
                            {
                                var obje = new Object(rgxOBJE.Replace(line, ""));
                                currentObject = obje;
                                currentTable.Objects.Add(obje);
                            }
                        }
                        else if (line.IndexOf("TABL") > -1)
                        {
                            var name = rgxTABL.Replace(line, "");
                            if (currentTopic.Tables.ContainsKey(name))
                            {
                                currentTable = currentTopic.Tables[name];
                            }
                            else
                            {
                                var table = new Table(name);
                                currentTable = table;
                                currentTopic.Tables.Add(name, table);
                            }
                        }
                        else if (line.IndexOf("TOPI") > -1)
                        {
                            var name = rgxTOPI.Replace(line, "");
                            if (topics.ContainsKey(name))
                            {
                                currentTopic = topics[name];
                            }
                            else
                            {
                                var topic = new Topic(name);
                                currentTopic = topic;
                                topics.Add(name, topic);
                            }
                        }
                        else if (line.IndexOf("ETOP") > -1)
                        {
                            currentTopic = null;
                        }
                        else if (line.IndexOf("ETAB") > -1)
                        {
                            currentTable = null;
                        }
                        else if (line.IndexOf("ELIN") > -1)
                        {
                            currentObject = null;
                        }
                        else if (currentTopic != null && currentTable != null && currentObject != null)
                        {
                            currentObject.Data.Add(line);
                        }
                        else if (line.IndexOf("MTID") > -1)
                        {
                            MTID = rgxMTID.Replace(line, "");
                        }
                        else if (line.IndexOf("MODL") > -1)
                        {
                            MODL = rgxMODL.Replace(line, "");
                        }
                    }
                    sw.Stop();
                    Console.WriteLine(file + " parsed in {0}s", sw.ElapsedMilliseconds / 1000.0);
                }
            }
        }
    }

Upvotes: 1

Thomas Ayoub

Reputation: 29471

A few tips:

Why do you use File.Open, then BufferedStream then StreamReader when you can do the job with just a StreamReader, which is buffered?
You should reorder your conditions with the one that happen the more often in first.
Consider read all lines then use Parallel.ForEach

Upvotes: 1

Reading text files with 64bit process very slow

Answers (4)

Related Questions