Reputation: 14849
i am no RegEx expert.
I need to extract a certain number out of an HTML table.
An example:
<td>13</td><td>
</td><td align="right">29.543</td>
<td align="right">1.777</td>
<td align="right">2.588</td>
</tr><tr><td><a href="player.php?p=84668" >Caterdamus</a></td>
<td>7</td><td>
Meister</td><td align="right">9.874</td>
<td align="right">1.716</td>
<td align="right">5.791</td>
</tr><tr><td><a href="player.php?p=87216" >grappa</a></td>
<td>2</td><td>
</td><td align="right">1.044</td>
<td align="right">21</td>
<td align="right">146</td>
</tr></table>
The pattern looks like this :
<td>13</td><td>
<td>7</td><td>
<td>2</td><td>
How do i extract the numbers out of the text and store it into a variable. Hint: the numbers are positive integers.
Thanks:)
Upvotes: 1
Views: 370
Reputation: 116187
I wouldn't use regular expressions to parse HTML or XML. Instead, I would load the document into an HTML DOM parser - you can find several open source ones here. I can't vouch for any of these - I've never worked with anything other than XML in Java.
Upvotes: 8
Reputation: 17554
I don't know java regex exactly but I'ld suggest something like
/<td>(\d+)<\/td><td>/
since syntax of regex is quite similar in multiple languages.
Explanations
(
... )
captures the content inside of the regex's return variables\d
represents a digit+
stays for one or more occurences of the token on it's left sidesince you use only positive integers, you don't have to care about signs and decimal points.
Upvotes: 3