Reputation: 311
I have to mine a large number of datasets and wanted to know if its better to get a desktop with a GPU or just spread the workload over separate machines?
I think with GPU I may have to write my own code using the something like CUDA toolkit.
the number of strings on which I have to perform a regex search is of the order of millions and I have to match a number of different keywords running into 10k so its like ~ 50 billion pattern matches. I want to spread the workload so that a million can be done on one core etc...
Any suggestions would help.
Upvotes: 0
Views: 158
Reputation: 6029
As you want to process large dataset, Hadoop might be a solution. Hadoop implements Map-Reduce algorithm (Originally by Google). With Hadoop you can split your task into multiple sub-parts and let individual machine process each part.
The size (50 billion matches) you mentioned can be processed using a cluster of Hadoop nodes. If you do not have many machines, you can rent it from Amazon and they have Elastic mapreduce.
http://aws.amazon.com/elasticmapreduce/
Upvotes: 1