Reputation: 2781
I am currently trying to improve the speed of my application by playing with my way of getting the information.
I read an html page from which I get URL
and other information. For this I mainly use String.contains()
and String.split()
. But I was wondering what is the most efficient way to do this. I looked a bit and tried some of those but the results are quite similar for me :/
Here is a bit of my code (some part are just here for testing) :
Pattern p = Pattern.compile("\" title=\"Read ");
//Pattern p2 = Pattern.compile("Online\">");
//Pattern p3 = Pattern.compile("</a></th>");
Pattern p4 = Pattern.compile("Online\">(.*)</a></th>");
while ((inputLine = in.readLine()) != null)
{
if(inputLine.contains("<table id=\"updates\">"))
{
tmp = inputLine.split("<tr><th><a href=\"");
for(String s : tmp)
{
if(s.contains("\" title=\"Read "))
{
//url = s.split("\" title=\"Read ")[0].replace(" ", "%20");
//name = s.split("Online\">")[1].split("</a></th>")[0];
url = p.split(s)[0].replace(" ", "%20");
//name = p3.split(p2.split(s)[1])[0];
Matcher matcher = p4.matcher(s);
while(matcher.find())
name = matcher.group(1);
array.add(new Object(name, url));
}
}
break;
}
}
As you can see I tried here Pattern
, Matcher
, split
or pattern.split()
but I also know that there are replaceAll or replaceFirst
.
In this case what is for you the best way to do this ?
Thanks a lot.
PS: I read here : http://chrononsystems.com/blog/hidden-evils-of-javas-stringsplit-and-stringr that Pattern.split
was better than split()
but I couldn't find a bigger benchmark.
----- UPDATE ----
Pattern p1 = Pattern.compile("\" title=\"Read ");
Pattern p2 = Pattern.compile("Online\">(.*?)</a></th>");
Matcher matcher = p2.matcher("");
while( (inputLine = in.readLine()) != null)
{
if( (tmp = inputLine.split("<tr><th><a href=\"")).length > 1 )
{
for(String s : tmp)
{
if(s.contains("\" title=\"Read "))
{
url = p1.split(s)[0].replace(" ", "%20");
if(matcher.reset(s).find())
name = matcher.group(1);
arrays.add(new Object(name, url));
}
}
break;
}
}
Upvotes: 1
Views: 854
Reputation: 20163
Any string function that uses regular expressions (which are matches(s)
, replaceAll(s,s)
, replaceFirst(s,s)
, split(s)
, and split(s,i)
) compiles the regular expression and creates a Matcher object every time, which is very inefficient when used in a loop.
If you need to speed thigs up, the first step is to stop using the String functions, and instead use Pattern and Matcher directly. Here's an answer where I demonstrate this.
And ideally, you should create only a single Matcher object, as I describe in this answer.
For more regex information please check out the FAQ
Upvotes: 2