Reputation: 61
I would like to write an application, in java, that allows me to open a file (txt) and using the users input, search for all instances of a particular word or string.
As there are probably more experienced programmers here, i would like some advice about how to go about creating such a tool. How would you go about creating a basic text search tool?
I have been playing around with some java classes such as File, FileOutputStream, FileInputStream, InputStreamReader, OutputStreamReader, FileReader, StreamTokenizer and would like to know the optimal way to open and search a file using java?
Thanks for any input you may have, Des.
Upvotes: 2
Views: 5346
Reputation: 80176
What do you want to do with the search results? Is it just to count the number of occurances of a given word or the phrase? What if the user types "line" and the file contains "lines"; should there be a match? Do you have to allow multiple searches on the same file?
Anyways the point is that full-text search is a very involved subject. But there is help ;-). My suggestion for you is to create an in-memory index of the file to be searched using open source Lucene project. It is super fast and has answers to all of the above questions and much more. Here is the code to create that in-memory index. Once you have that index created, you can perform sophisticated searches
Upvotes: 1
Reputation: 6653
I would recommend using a hash table of some kind. If your data is not changing (is this just a basic search of a static document, or is it like part of a text editor?) then Perfect Hashing is going to give you constant-time lookups. This is VERY fast. If not, maybe try Cuckoo hashing or even just linear probing.
I would read in the file using Scanner
or any buffered reader, hash every word as a key to whatever additional data you want (such as line numbers / word indexes of all occurrences), then you can query the hash table super fast.
Edit: Here is a Java implementation of Perfect hashing for Strings: http://blog.tomgibara.com/post/438939809/minimal-perfect-hash-strings
Upvotes: 2
Reputation: 10359
To have speed, I would use the BufferedReader. Something like this :
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(givenFile)));
BufferedReader is the most efficent way of reading a file, IMHO.
There is an existing tool, created by Keith Fenske and named FileSearch, that exists. You can dowload the sources and have a look at it :)
Upvotes: 0
Reputation: 423
You can read in the text file to a string and then call the split() method. See documentation. This will return an array of strings. After this you can do a search (ie. Binary search) on the array and continue doing it, removing the word you find and saving the location, until all instances have been found. After that you will have all the locations of the search string in the document.
Here is a wikipedia article on binary search in case you might need it: http://en.m.wikipedia.org/wiki/Binary_search_algorithm?wasRedirected=true
Upvotes: 0
Reputation: 45724
Using a File
with a Scanner
, and a StringBuilder
should give you a good start into that topic.
Upvotes: 2