Des C
Des C

Reputation: 61

Searching a Files Content Using Java?

I would like to write an application, in java, that allows me to open a file (txt) and using the users input, search for all instances of a particular word or string.

As there are probably more experienced programmers here, i would like some advice about how to go about creating such a tool. How would you go about creating a basic text search tool?

I have been playing around with some java classes such as File, FileOutputStream, FileInputStream, InputStreamReader, OutputStreamReader, FileReader, StreamTokenizer and would like to know the optimal way to open and search a file using java?

Thanks for any input you may have, Des.

Upvotes: 2

Views: 5346

Answers (5)

Aravind Yarram
Aravind Yarram

Reputation: 80176

What do you want to do with the search results? Is it just to count the number of occurances of a given word or the phrase? What if the user types "line" and the file contains "lines"; should there be a match? Do you have to allow multiple searches on the same file?

Anyways the point is that full-text search is a very involved subject. But there is help ;-). My suggestion for you is to create an in-memory index of the file to be searched using open source Lucene project. It is super fast and has answers to all of the above questions and much more. Here is the code to create that in-memory index. Once you have that index created, you can perform sophisticated searches

Upvotes: 1

idolize
idolize

Reputation: 6653

I would recommend using a hash table of some kind. If your data is not changing (is this just a basic search of a static document, or is it like part of a text editor?) then Perfect Hashing is going to give you constant-time lookups. This is VERY fast. If not, maybe try Cuckoo hashing or even just linear probing.

I would read in the file using Scanner or any buffered reader, hash every word as a key to whatever additional data you want (such as line numbers / word indexes of all occurrences), then you can query the hash table super fast.

Edit: Here is a Java implementation of Perfect hashing for Strings: http://blog.tomgibara.com/post/438939809/minimal-perfect-hash-strings

Upvotes: 2

LaGrandMere
LaGrandMere

Reputation: 10359

To have speed, I would use the BufferedReader. Something like this :

BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(givenFile)));

BufferedReader is the most efficent way of reading a file, IMHO.

There is an existing tool, created by Keith Fenske and named FileSearch, that exists. You can dowload the sources and have a look at it :)

Upvotes: 0

prgmast3r
prgmast3r

Reputation: 423

You can read in the text file to a string and then call the split() method. See documentation. This will return an array of strings. After this you can do a search (ie. Binary search) on the array and continue doing it, removing the word you find and saving the location, until all instances have been found. After that you will have all the locations of the search string in the document.

Here is a wikipedia article on binary search in case you might need it: http://en.m.wikipedia.org/wiki/Binary_search_algorithm?wasRedirected=true

Upvotes: 0

Fabian Steeg
Fabian Steeg

Reputation: 45724

Using a File with a Scanner, and a StringBuilder should give you a good start into that topic.

Upvotes: 2

Related Questions