n00ki3
n00ki3

Reputation: 14849

RegEx : Extract Number out of Source Code

i am no RegEx expert. I need to extract a certain number out of an HTML table.
An example:

<td>13</td><td>
  </td><td align="right">29.543</td>
  <td align="right">1.777</td>
  <td align="right">2.588</td>
</tr><tr><td><a href="player.php?p=84668" >Caterdamus</a></td>
  <td>7</td><td>
  Meister</td><td align="right">9.874</td>
  <td align="right">1.716</td>
  <td align="right">5.791</td>
</tr><tr><td><a href="player.php?p=87216" >grappa</a></td>
  <td>2</td><td>
  </td><td align="right">1.044</td>
  <td align="right">21</td>
  <td align="right">146</td>
</tr></table>

The pattern looks like this :

<td>13</td><td>
<td>7</td><td>
<td>2</td><td>

How do i extract the numbers out of the text and store it into a variable. Hint: the numbers are positive integers.

Thanks:)

Upvotes: 1

Views: 370

Answers (3)

Thomas Owens
Thomas Owens

Reputation: 116187

I wouldn't use regular expressions to parse HTML or XML. Instead, I would load the document into an HTML DOM parser - you can find several open source ones here. I can't vouch for any of these - I've never worked with anything other than XML in Java.

Upvotes: 8

Etan
Etan

Reputation: 17554

I don't know java regex exactly but I'ld suggest something like

/<td>(\d+)<\/td><td>/

since syntax of regex is quite similar in multiple languages.

Explanations

  • ( ... ) captures the content inside of the regex's return variables
  • \d represents a digit
  • + stays for one or more occurences of the token on it's left side

since you use only positive integers, you don't have to care about signs and decimal points.

Upvotes: 3

eflorico
eflorico

Reputation: 3629

<td>(\d+)</td>

should do the job.

Upvotes: 2

Related Questions