Reputation: 377
I'm trying to think of the most efficient way to search a directory full of text files (possibly 2000 files around 150 lines each) for a keyword. If I was just searching for one keyword then performance wouldn't be so much of an issue, but in my application I want to be able to search for a different keyword at a later point, possibly multiple times. So iterating over the entire file collection each time seems time consuming. And storing everything in memory seems quite memory expensive too.
What would be the best way to do this? I don't have access to an SQL database or anything like that, so I can't temporarily dump the contents into a database and search that periodically; it's just going to be a regular Windows application.
The most primitive approach I can think of is to dump all of the files into one huge XML file and search that - rather than iterating through all of the files in the directory each time a keyword search happens. But even this seems like it could be quite time intensive?
I will know the directory name in advance, so I can pre-process the contents - if this could possibly help in-so-far as optimisation.
Any suggestions are welcome, thanks.
Upvotes: 4
Views: 625
Reputation: 3175
As "L.B" stated, you can use Lucene.net for creating an inverted index. It is a .Net implmentation from a java library. Lucene on apache.org
This is a small example how to do it.
Upvotes: 0
Reputation: 33139
Why not use a cmd utility that you call from C#?
The findstr
utility in DOS can do what you need and it is efficient: http://technet.microsoft.com/en-us/library/bb490907.aspx
How to call it from C#: How To: Execute command line in C#, get STD OUT results
Good luck!
Upvotes: 3