Reputation: 57
I am looking for suggestions to use a distributed system to process this data. I have a data from organization wide computers (laptops, desktops, tablets etc.). The sample table contains data for all the files present on each computer in this organization. The idea is to find files with certain keywords (3000+) within FileName or FilePath i.e wild card pattern matching.
+-------------+----------+----------+----------+----------+
| MachineName | FileName | FilePath | FileType | FileSize |
+-------------+----------+----------+----------+----------+
The current solution is running on a beefy SQL Server but still takes hours to run through 80 million records due to wild card SQL queries i.e FILENAME LIKE '%abc%' or FILEPATH LIKE '%abc%' and the list goes on.
We have thought about FullText indexes on SQL but this activity is performed once a month and then the data is discarded. So, investing resources in getting full text index populated does not seem worth it in terms of time & resources.
The requirement is to get this activity completed in much shorter time and hence we are exploring for options.
Should it be ElasticSearch Or Solr or some other cloud-based solution? Please provide advise on some high-level solution.
Upvotes: 2
Views: 61
Reputation: 1820
For this use case, ElasticSearch is a good choice. It provides all you would need - because every field is indexed, it is commonly being used as a real-time full-text search engine.
On the other hand, Solr is a good choice too. From your requirements, I think that ElasticSearch offers much more than you would need. Solr is a bit older, which results in the excellent documentation. It specializes only in the text which is not a problem in your case. It is scalable and optimized for high traffic, so it should be suitable for your problem.
I think both ElasticSearch and Solr will fulfill what you need; choice is up to you - what is more sympathetic to you :) In my opinion, if you can, best is to try both of them and choose then.
Upvotes: 1