bobo
bobo

Reputation: 8727

How do we create such a regular expression to extract data?

<br>Aggie<br><br>John<br><p>Hello world</p><br>Mary<br><br><b>Peter</b><br>

I'd like to create a regexp that safely matches these:

<br>Aggie<br>
<br>John<br>
<br>Mary<br>
<br><b>Peter</b><br>

This is possible that there are other tags (e.g. <i>,<strike>...etc ) between each pair of <br> and they have to be collected just like the <br><b>Peter</b><br>

How should the regexp look like?

Upvotes: 0

Views: 116

Answers (3)

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

<br>.*?<br>

will match anything from one <br> tag to the closest following one.

The main problem with parsing HTML using regexes is that regexes can't handle arbitrarily nested structures. This is not a problem in your example.

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328604

Split the string at (<br>)+. You'll get empty strings at the beginning and the end of the result, so you need to remove them, too.

If you want to preserve the <br>, then this is not possible unless you know that there is one before and after each element in the result.

Upvotes: 0

RC.
RC.

Reputation: 28207

If you learn one thing on SO, let it be - "Do not parse HTML with a regex". Use an HTML Parser

Upvotes: 6

Related Questions