Improving performance of a repetitive, time-consuming program

Question

I'm having trouble finding an answer to this question, and it may be due to poor phrasing.

I have a small python program that extracts data from a large log file. It then displays the data in a particular format. Nothing fancy, just reads, parses and prints.

It takes about a minute to do this.

Now, I want to run this across 300 files. If I put my code inside a loop that iterates over the 300 files and executes the same piece of code, one by one, it will take 300 minutes to complete. I would rather it didn't take this long.

I have 8 virtual processors on this machine. It can handle extra load when this program is being run. Can I spread the workload over these vcpus to reduce the total runtime? If so - what is the ideal way to implement this?

It's not code I'm after, it's the theory behind it.

Thanks

Mike Dunlavey · Accepted Answer

Don't make parallelism your first priority. Your first priority should be making the single-thread performance as fast as possible. I rely on this method. From your short description, it sounds like there might be juicy opportunities for speedup in the I/O and in the parsing.

After you do that, if the program is CPU-bound (which I doubt - it should be spending most of its time in I/O) then parallelism might help.

Improving performance of a repetitive, time-consuming program

Answers (1)

Related Questions