Reputation: 5896
I'm attempting to non-greedily parse out TD tags. I'm starting with something like this:
<TD>stuff<TD align="right">More stuff<TD align="right>Other stuff<TD>things<TD>more things
I'm using the below as my regex:
Regex.Split(tempS, @"\<TD[.\s]*?\>");
The records return as below:
""
"stuff<TD align="right">More stuff<TD align="right>Other stuff"
"things"
"more things"
Why is it not splitting that first full result (the one starting with "stuff")? How can I adjust the regex to split on all instances of the TD tag with or without parameters?
Upvotes: 41
Views: 34579
Reputation: 18485
*
Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)*?
Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)Upvotes: 19
Reputation: 85865
The regex you want is <TD[^>]*>
:
< # Match opening tag
TD # Followed by TD
[^>]* # Followed by anything not a > (zero or more)
> # Closing tag
Note: .
matches anything (including whitespace) so [.\s]*?
is redundant and wrong as [.]
matches a literal .
so use .*?
.
Upvotes: 17