Reputation: 37
Hej sharp minds!
I need your expert guidance in making some choices.
Situation is like this: 1. I have approx. 500 flat files containing from 100 to 50000 records that have to be processed. 2. Each record in the files mentioned above has to be replaced using value from the separate huge file (2-15Gb) containing 100-200 million entries.
So I thought to make the processing using multicores - one file per thread/fork.
Is that a good idea? Since each thread needs to read from same huge file? It's a bit of a problem loading it into memory do to the size? Using file::tie is an option, but is that working with threads/forks?
Need your advise how to proceed.
Thanks
Upvotes: 0
Views: 1193
Reputation: 20683
Yes, of course, using multiple cores for multi-threaded application is a good idea, because that's what those cores are for. Though it sounds like your problem involves heavy I/O, so, it might be that you will not use that much of CPU anyway.
Also since you are only going to read that big file, tie
should work perfectly. I haven't heard of problems with that. But if you are going to search that big file for each record in your smaller files, then I guess it would take you a long time despite of the number of threads you use. If data from big file can be indexed based on some key, then I would advice to put it in some NoSQL databse and access it from your program. That would probably speed up your task even more than using multiple threads/cores.
Upvotes: 2