Reputation: 153
I have issue with parsing html data. Java's String.indexof() is extremelly slow. Could anyone suggest any solutions to significantly speed it up?
while (counter2 <= found)
{
number = Integer.toString(counter2);
start = page.indexOf("<result" + number + ">") + 8 + number.length();
end = page.indexOf("</result" + number + ">");
if (start > 0 && end > 0)
{
buffer = page.substring(start, end);
}
page = page.substring(end, page.length());
start = buffer.indexOf("<word>") + 6;
end = buffer.indexOf("</word>");
if (start > 0 && end > 0)
{
Word = buffer.substring(start, end);
}
start = buffer.indexOf("<vocabulary>") + 12;
end = buffer.indexOf("</vocabulary>");
if (start > 0 && end > 0)
{
Dictionary = buffer.substring(start, end);
}
start = buffer.indexOf("<id>") + 4;
end = buffer.indexOf("</id>");
if (start > 0 && end > 0)
{
ID = buffer.substring(start, end);
}
sqlDriver.createDictionaryWord("Wordlist", ID, Word, Dictionary);
// counter = counter + 1;
counter2 = counter2 + 1;
}
I need to make it work at least 5 times faster somehow. Thanks for any help.
Upvotes: 0
Views: 144
Reputation: 153
I made xml and used advice to use XmlPullParser. A bit faster, but still on some devices over minute, diring file size 1.7mb. Quite confusing.
Upvotes: 0
Reputation: 13865
Pattern matcher
using regex is quite faster than indexOf()
for longer Strings (For smaller Strings, indexOf()
is better than regex). Use your text and a regex to find the index of your String pattern.
Pattern pattern = Pattern.compile(regex);
public static void getIndices(String text, Pattern pattern) {
Matcher matcher = pattern.matcher(text);
matcher.find();
System.out.print("Start index: " + matcher.start());
System.out.print("End index: " + matcher.end());
}
Note that you have to compile your regex to Pattern
object only once for every regex and so don't put it inside a loop.
Upvotes: 1