Reputation: 122
I'm having trouble finding an answer to this question, and it may be due to poor phrasing.
I have a small python program that extracts data from a large log file. It then displays the data in a particular format. Nothing fancy, just reads, parses and prints.
It takes about a minute to do this.
Now, I want to run this across 300 files. If I put my code inside a loop that iterates over the 300 files and executes the same piece of code, one by one, it will take 300 minutes to complete. I would rather it didn't take this long.
I have 8 virtual processors on this machine. It can handle extra load when this program is being run. Can I spread the workload over these vcpus to reduce the total runtime? If so - what is the ideal way to implement this?
It's not code I'm after, it's the theory behind it.
Thanks
Upvotes: 0
Views: 63
Reputation: 40679
Don't make parallelism your first priority. Your first priority should be making the single-thread performance as fast as possible. I rely on this method. From your short description, it sounds like there might be juicy opportunities for speedup in the I/O and in the parsing.
After you do that, if the program is CPU-bound (which I doubt - it should be spending most of its time in I/O) then parallelism might help.
Upvotes: 1