Reputation: 2125
I have thousands of xml files in different sub directories under one root folder. My requirement is to search for a text in all these xml files irrespective of their location in the xml file.
Currently I am using BufferedReader class to read these xml files (my code looks like below)
while ((currentLine = br.readLine()) != null) {
if (currentLine.contains("myTargetString")) {
temp = currentLine;
myArraylist.add(temp );
}
But I know that there should some best way to search through these xml files, but cant figure out the best API or way.
I get one string as an input and my program should be able to search through all the xml files and return the file names. By using this BufferedReader it is taking much time.
Any ideas would be helpful.
Upvotes: 0
Views: 8849
Reputation: 2312
If you need to improve the speed and can not use indexes (lucene would be my recommendation), you can filter your input first by using the good old recursive grep command grep -r <searchtext> <path>
. (Link to grep on windows question). And then parsing the resulting files with Java to filter out false positive hits (commented out blocks, matching element names,...). Grep is IMHO the fastest way to find text in a large number of files without an index.
Upvotes: 0
Reputation: 382
So there's two possible solutions you could do here. Firstly for each file, you could parse with an XML paser (there are many API's for Java) then use something like an xpath query (something like //*[text() = 'your query'] to locate an element that matches your text criteria.
Secondly, you could look at what JamesB suggested and go for an indexed solution like Lucene, where for every file in some directory, index those files and then perform a search over them using something likes lucenes search API to find your text string.
Upvotes: 1