Reputation: 3455
This is only for a small Android program I am messing with so I only need to match one or two tags
I have one HTML tag and I can get whats inside that tag which is "FC-Cologne" I use this code to get it
Pattern pattern = Pattern.compile("report\">(.*?)</a>",Pattern.MULTILINE);
here is the HTML tag I can get to work
<a href="/match-menu/3405570/first-team/fc-cologne=report"> FC Cologne</a>
But I can't get this tag, I don't know is it because of the space after the word "opposition" or/and the quotes inside the HTML tag, because they are not in the first tag
This is the one I can't get to work
<td class="bold opposition "> "Olympiacos" </td>
This is the code I am trying
Pattern pattern = Pattern.compile("opposition \">(.*?)</td>",Pattern.MULTILINE);
I have tried replacing the spaces " " with "" an empty string and I have tried \s where the space is but I get nothing.
I would appreciate if anyone could help me.
Upvotes: 0
Views: 2553
Reputation: 12662
This is what you're looking for I believe.
<(\w+)\s*(?:\w+(?:=(?:'(?:[^']|(?<=\\)')*'|"(?:[^"]|(?<=\\)")*"))?\s*)*>(.*?)</\1\s*>
You will want to use the second group to get the contents of the tag (the first group is the tag name). Note that this does not work recursively. Nested elements are captured in the second group so you will need to use this regex on the second group of its match until there are no matches if that makes sense.
Upvotes: 0
Reputation: 6424
Unless you have a typo in one of the two - < /td>
has a space after the <
and in your regex </td>
doesn't.
Adding a space to the regex after the <
caused the match to succeed in RegexBuddy
Update: Seems the space is not in the tag the OP is working with.
In RegexBuddy I have the pattern (copied as a Java String)
"opposition \">(.*?)</td>"
which matches the html
< td class="bold opposition "> "Olympiacos" </td>
giving a match of
opposition "> "Olympiacos" </td>
and Group 1 of
"Olympiacos" <--Line ends there.
Upvotes: 2