Amit
Amit

Reputation: 34793

Java - Regex for the given string

I have the following html code segment:

        <br>
        Date: 2010-06-20,  1:37AM PDT<br>
        <br>
        Daddy: <a href="...">www.google.com</a>
        <br>

I want to extract

Date: 2010-06-20, 1:37AM PDT

and

Daddy: <a href="...">www.google.com</a>

with the help of java regex.

So what regex I should use?

Upvotes: 1

Views: 171

Answers (1)

polygenelubricants
polygenelubricants

Reputation: 383966

This should give you a nice starting point:

    String text = 
    "        <br>\n" +
    "        Date: 2010-06-20,  1:37AM PDT<br>   \n" +
    "   <br>    \n" +
    "Daddy: <a href=\"...\">www.google.com</a>   \n" +
    "<br>";

    String[] parts = text.split("(?:\\s*<br>\\s*)+");
    for (String part : parts) {
        System.out.println("[" + part + "]");
    }

This prints (as seen on ideone.com):

[]
[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

This uses String[] String.split(String regex). The regex pattern is "one or more of <br>, with preceding or trailing whitespaces.


Guava alternative

You can also use Splitter from Guava. It's actually a lot more readable, and can omitEmptyStrings().

    Splitter splitter = Splitter.on("<br>").trimResults().omitEmptyStrings();
    for (String part : splitter.split(text)) {
        System.out.println("[" + part + "]");
    }

This prints:

[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

Related questions

Upvotes: 1

Related Questions