Reputation: 34793
I have the following html code segment:
<br>
Date: 2010-06-20, 1:37AM PDT<br>
<br>
Daddy: <a href="...">www.google.com</a>
<br>
I want to extract
Date: 2010-06-20, 1:37AM PDT
and
Daddy: <a href="...">www.google.com</a>
with the help of java regex.
So what regex I should use?
Upvotes: 1
Views: 171
Reputation: 383966
This should give you a nice starting point:
String text =
" <br>\n" +
" Date: 2010-06-20, 1:37AM PDT<br> \n" +
" <br> \n" +
"Daddy: <a href=\"...\">www.google.com</a> \n" +
"<br>";
String[] parts = text.split("(?:\\s*<br>\\s*)+");
for (String part : parts) {
System.out.println("[" + part + "]");
}
This prints (as seen on ideone.com):
[]
[Date: 2010-06-20, 1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]
This uses String[] String.split(String regex)
. The regex pattern is "one or more of <br>
, with preceding or trailing whitespaces.
You can also use Splitter
from Guava. It's actually a lot more readable, and can omitEmptyStrings()
.
Splitter splitter = Splitter.on("<br>").trimResults().omitEmptyStrings();
for (String part : splitter.split(text)) {
System.out.println("[" + part + "]");
}
This prints:
[Date: 2010-06-20, 1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]
Upvotes: 1