Anton Burtsev
Anton Burtsev

Reputation: 55

How to achieve 100% CPU usage in multithreaded application?

I have ~100 text files 200MB each and I need to parse them. The program below loads files and processes them in parallel. It can create a Thread per file or a Process per file.

The problem: If I use threads it never uses 100% CPU and takes longer to complete.

THREAD PER FILE
total time: 430 sec
CPU usage 15-20%
CPU frequency 1.2 GHz

PROCESS PER FILE
total time 100 sec
CPU usage 100%
CPU frequency 3.75 GHz

I'm using E5-1650 v3 Hexa-Core with HT, therefore I process 12 files at a time.

How can I achive 100% CPU utilisation by threads?

Code below does not use result of processing since it doen not affect the problem.

using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading;

namespace libsvm2tsv
{
    class Program
    {
        static void Main(string[] args)
        {
            var sw = Stopwatch.StartNew();

            switch (args[0])
            {
                case "-t": LoadAll(args[1], LoadFile); break;
                case "-p": LoadAll(args[1], RunChild); break;
                case "-f": LoadFile(args[1]); return;
            }

            Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
            Console.ReadLine();
        }

        static void LoadAll(string folder, Action<string> algorithm)
        {
            var sem = new SemaphoreSlim(12);
            Directory.EnumerateFiles(folder).ToList().ForEach(f=> {
                sem.Wait();
                new Thread(() => { try { algorithm(f); } finally { sem.Release(); } }).Start();
            });
        }

        static void RunChild(string file)
        {
            Process.Start(new ProcessStartInfo
            {
                FileName = Assembly.GetEntryAssembly().Location,
                Arguments = "-f \"" + file + "\"",
                UseShellExecute = false,
                CreateNoWindow = true
            })
            .WaitForExit();
        }

        static void LoadFile(string inFile)
        {
            using (var ins = File.OpenText(inFile))
                while (ins.Peek() >= 0)
                    ParseLine(ins.ReadLine());
        }

        static long[] ParseLine(string line)
        {
            return line
                .Split()
                .Skip(1)
                .Select(r => (long)(double.Parse(r.Split(':')[1]) * 1000))
                .Select(r => r < 0 ? -1 : r)
                .ToArray();
        }
    }
}

Upvotes: 2

Views: 3322

Answers (4)

Anton Burtsev
Anton Burtsev

Reputation: 55

Finally, I've found the bottleneck. I'm using string.Split to parse numbers from every line of data, so I get billions short strings. These strings are put in heap. Since all threads share single heap memory allocation is synchronized. Since processes have separate heaps - no syncronization occures and things work fast. That's the root of issue. So, I rewrote parsing using IndexOf rather than Split and threads started to perform even better than separate processes. Just as I expected it to be.

Since .NET has no default tool to parse real numbers out of the certain position inside string I used this one: https://codereview.stackexchange.com/questions/75791/optimize-custom-double-parse with small modification.

using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using System.Threading.Tasks;

namespace libsvm2tsv
{
    class Program
    {

        static void Main(string[] args)
        {
            var sw = Stopwatch.StartNew();

            switch (args[0])
            {
                case "-t": LoadAll(args[1], LoadFile); break;
                case "-p": LoadAll(args[1], RunChild); break;
                case "-f": LoadFile(args[1]); return;
            }

            Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
            Console.ReadLine();
        }

        static void LoadAll(string folder, Action<string> algorithm)
        {
            Parallel.ForEach(
                Directory.EnumerateFiles(folder),
                new ParallelOptions { MaxDegreeOfParallelism = 12 },
                f => algorithm(f));
        }

        static void RunChild(string file)
        {
            Process.Start(new ProcessStartInfo
            {
                FileName = Assembly.GetEntryAssembly().Location,
                Arguments = "-f \"" + file + "\"",
                UseShellExecute = false,
                CreateNoWindow = true
            })
            .WaitForExit();
        }

        static void LoadFile(string inFile)
        {
            using (var ins = File.OpenText(inFile))
                while (ins.Peek() >= 0)
                    ParseLine(ins.ReadLine());
        }

        static long[] ParseLine(string line)
        {
            // first, count number of items
            var items = 1;
            for (var i = 0; i < line.Length; i++)
                if (line[i] == ' ') items++;

            //allocate memory and parse items
            var all = new long[items];
            var n = 0;
            var index = 0;
            while (index < line.Length)
            {
                var next = line.IndexOf(' ', index);
                if (next < 0) next = line.Length;
                if (next > index)
                {
                    var v = (long)(parseDouble(line, line.IndexOf(':', index) + 1, next - 1) * 1000);
                    if (v < 0) v = -1;
                    all[n++] = v;

                }
                index = next + 1;
            }

            return all;
        }

        private readonly static double[] pow10Cache;
        static Program()
        {
            pow10Cache = new double[309];

            double p = 1.0;
            for (int i = 0; i < 309; i++)
            {
                pow10Cache[i] = p;
                p /= 10;
            }
        }

        static double parseDouble(string input, int from, int to)
        {
            long inputLength = to - from + 1;
            long digitValue = long.MaxValue;
            long output1 = 0;
            long output2 = 0;
            long sign = 1;
            double multiBy = 0.0;
            int k;

            //integer part
            for (k = 0; k < inputLength; ++k)
            {
                digitValue = input[k + from] - 48; // '0'

                if (digitValue >= 0 && digitValue <= 9)
                {
                    output1 = digitValue + (output1 * 10);
                }
                else if (k == 0 && digitValue == -3 /* '-' */)
                {
                    sign = -1;
                }
                else if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
                {
                    break;
                }
                else
                {
                    return double.NaN;
                }
            }

            //decimal part
            if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
            {
                multiBy = pow10Cache[inputLength - (++k)];

                for (; k < inputLength; ++k)
                {
                    digitValue = input[k + from] - 48; // '0'

                    if (digitValue >= 0 && digitValue <= 9)
                    {
                        output2 = digitValue + (output2 * 10);
                    }
                    else
                    {
                        return Double.NaN;
                    }
                }

                multiBy *= output2;
            }

            return sign * (output1 + multiBy);
        }
    }
}

Upvotes: 2

ewramner
ewramner

Reputation: 6233

As others have stated IO will probably be a bottleneck in the end and getting 100% CPU usage is really irrelevant. I feel they are missing something, though: you do get higher throughput with processes than with threads and that means IO is not the only bottleneck. The reason is that the CPU runs with a higher frequency with processes and you want it to run at peak spead when it is not waiting for IO! So, how can you do that?

The easiest way is to set the power profile from the power options manually. Edit power options and set both minimum and maximum processor state to 100%. That should do the job.

If you want to do it from your program, have a look at How to Disable Dynamic Frequency Scaling?. There is probably a similar API for .NET without using native code, but I couldn't find it now.

Upvotes: 0

Andrew Henle
Andrew Henle

Reputation: 1

I have ~100 text files 200MB each and I need to parse them.

The fastest way to read or write data from/to a spinning disk is sequentially in order to minimize the time the disk heads need to seek to find data or write it to the specified location. So doing parallel IO to a single disk is going to slow IO rates down - and depending on the actual IO pattern it can slow rates down dramatically. A disk that can handle 100 MB/sec sequentially might only be able to move 20 or 30 kilobytes per second doing parallel reads/writes of small blocks of data.

Were I optimizing such a process, I wouldn't worry about CPU utilization first, I'd optimize IO throughput first. You are IO bound unless you're doing some really CPU-intensive parsing. Once your IO throughput is optimized, if you're getting 100% CPU utilization then you're CPU bound. If your design scales nicely, then you can add CPUs and probably run faster.

To speed up your IO, you first need to minimize disk seeks, especially if you're using consumer-grade, cheap SATA drives. There are multiple ways to do this.

First, the easiest - eliminate the disk heads. Put your data on SSDs. Problem solved without having to write complex, bug-prone optimized code. How much time will it take for you to make this run faster using software? You have to design something, test it, tune it, debug it, and importantly, keep it running and running well. None of that is free. One important cost is the opportunity cost of spending time making things go faster - when you're doing that, you're not solving any other problems. Faster hardware has none of those costs. In this case, buy the SSDs, plug them in, and you're faster.

But if you really want to spend several weeks or longer optimizing your processing software, here's how I'd go about it:

  1. Spread the data over multiple disks. You can't do parallel IO to physical disks quickly as the disk head seeks will kill performance. So do as much of the reading and writing to different disks as possible.
  2. For each disk, have a single reader or writer thread or process that feeds data to a worker pool or writes data provided by that worker pool.
  3. A tunable number of worker threads/processes to do the actual parsing.

That way, you can read the files and write output data all sequentally and without contention on each disk from other IO processes.

Upvotes: 1

mjwills
mjwills

Reputation: 23819

I would consider replacing ForEach with Parallel.ForEach and remove your explicit use of Threads. Use https://stackoverflow.com/a/5512363/34092 to set the number of threads to limit it to.

static void LoadAll(string folder, Action<string> algorithm)
{
    Parallel.ForEach(Directory.EnumerateFiles(folder), algorithm);
}

Upvotes: 0

Related Questions