Reputation: 17231

Is it better to roll all work into one loop or break it into several loops?

I am parsing a folder structure that is quite heavy (in terms of the number of folders and files). I have to go through all the folders and parse any files I come across. The files themselves are small (1000-2000 characters although a few are bigger). I have two options:

Go through all the folders and files and parse any that I come across in one big recursive loop.

Go through all the folders and store the paths of all the files that I come across. The in another loop, parse the files by referring to the stored file paths.

Which option would be better and maybe faster (the speed will most likely be I/O bound so most likely will not make a difference, but I thought I'd ask anyway)?

Upvotes: 2

Answers (4)

In silico

Reputation: 52169

You pick the option that gives you the most readable and the most understandable code.

Especially since the two options you provide are functionally identical. Seriously, you want to be able for others and yourself in the future to be able to look at it and have some clue as to what it does.

"The most readable and the most understandable" almost always means "the simplest and the easiest way." (Although some code is inherently complex. That's still not an excuse to write unreadable code.) Option 1 sounds easier to implement in my opinion, but try it for yourself. Profile for bottlenecks if it isn't fast enough.

Most likely, the actual disk I/O will take much longer than the total processor cycles or memory accesses needed for either option, so which option you take might not even be relevant. But the only way to know for sure how fast your programs are running and whether you need improvements is by profiling.

Upvotes: 7

ssube

Reputation: 48277

It depends a lot on how deep the folder structure will be and how much data you'll have to hold in memory (including number of files/filenames).

If you have an extremely deep structure, you could run into a stack overflow. However, with path length limits, it's not very likely. You will have to store all the file names in memory, which is probably going to be a pain, but probably won't actually be a problem.

Assuming the functions are reasonable simply, it will likely be easier to simply call the recursive search function for each directory you find and the file parser for each valid file, all in a single loop:

function search_folder:
    for each item in curdir:
         if item is file:
            parse_file(item)
         else if item is folder:
            search_folder(item)

That gives you a relatively simple and very readable structure, at the cost of potentially deep recursion. Caching filenames and going through them later involves a lot more code and will likely be less readable, and (assuming you handle directories that same way) will have the same amount of recursion.

I'd go with #1, since it seems the more flexible and elegant solution.

Upvotes: 0

littleadv

Reputation: 20272

The options seem to be functionally identical. I would say the consideration should be readability and maintainability - what is easier to understand and change later on when needed, expand or fix bugs in.

It is also worth considering breaking the functionality into separate objects - one is performing the search while the other is parsing the files found. Then you can run them concurrently and achieve better CPU utilization.

Upvotes: 0

Marvo

Reputation: 18143

How about one thread that creates the list of file names to process, and another thread that reads through that list of files and uses one of a handful of worker threads to do the processing?

I don't know how many directories there are, but just guessing that's not the big time sink. I'd say you'd get the best performance by having a thread pool, each thread in the pool parsing a file (once you have the list of them.) Because that stuff is gonna be so IO bound, the threading will probably make things far more efficient.

Upvotes: 2

Is it better to roll all work into one loop or break it into several loops?

Answers (4)

You pick the option that gives you the most readable and the most understandable code.

Related Questions