Toxicable
Toxicable

Reputation: 1709

What operations in this program are making it so slow?

So the input for this program is a file of 447000 lines, each line is data that can be split in to a list of legnth 5.

It takes about 30 minutes to process in the current state, I had it as a non parallel foreach loop before but that took a long amount of time to process (I didn't get a time for how long it took though), im not actually sure if i've saved much time but making it run in parallel. So basically i'm not sure what operation is making it take so long to process, or how to find what one. I tried using Diagnostic tools but that wasn't very accurate, said everything was 1ms.

As far as I understand I think every operation im doing is O(1) so if that's true then is it running as fast as it can then?

double bytesTransferred = 0;
double firstTimeValue = 0;
double lastTimeValue = 0;
double totalTimeValue = 0;
var dataAsList =  data.ToList();
var justTheTimeDifferences = new ConcurrentBag<double>();
var senderHostsHash = new ConcurrentBag<double>();
var receiverHostsHash = new ConcurrentBag<double>();
var sourcePortsHash = new ConcurrentBag<double>();
var destinationPortsHash = new ConcurrentBag<double>();
int lastPosition = dataAsList.Count;

 Parallel.ForEach(dataAsList, item =>
{
    var currentIndex = dataAsList.IndexOf(item);
    Console.WriteLine($"{currentIndex}/{lastPosition}");

    var itemAsList = item.Split(' ');
    if (dataAsList.IndexOf(item) == 0)
    {
        bytesTransferred += Convert.ToDouble(itemAsList[5]);
        return;
    }

    if (currentIndex == lastPosition - 1)
    {
        lastTimeValue = Convert.ToDouble(itemAsList[0]);
        totalTimeValue = lastTimeValue - firstTimeValue;
    }

    bytesTransferred += Convert.ToDouble(itemAsList[5]);
    var currentTime = Convert.ToDouble(itemAsList[0]);
    var lastEntry = dataAsList[currentIndex - 1];

    justTheTimeDifferences.Add(currentTime - Convert.ToDouble(lastEntry.Split(' ')[0]));
    senderHostsHash.Add(Convert.ToDouble(itemAsList[1]));
    receiverHostsHash.Add(Convert.ToDouble(itemAsList[2]));
    sourcePortsHash.Add(Convert.ToDouble(itemAsList[3]));
    destinationPortsHash.Add(Convert.ToDouble(itemAsList[4]));
});

An example input would be:

0.000000 1 2 23 2436 1  
0.010445 2 1 2436 23 2  
0.023775 1 2 23 2436 2  
0.026558 2 1 2436 23 1  
0.029002 3 4 3930 119 42  
0.032439 4 3 119 3930 15  
0.049618 1 2 23 2436 1 

To add some more information I am running this on my desktop with a 4 core CPU running at 4GHz, the information is being read off disk which is a SSD; 0% disk usage in Task manager when running it. I have also dropped off Console.ReadLine and running it again now, then will do some stopwatch benchmarks

Solution:
It was the IndexOf lookup that was causing the huge run time, changed it all to Parallel.For and only takes about 1.25 seconds to process

Upvotes: 0

Views: 122

Answers (1)

Regis Portalez
Regis Portalez

Reputation: 4860

As pointed above, Indexof in the loop makes your algo O(n2), which is bad...

You should consider a simple parallel.for on an array.

Upvotes: 1

Related Questions