Reputation: 2593
I am trying to pull some info here is my regex
<tr>
<td>([^<]+)<i><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/i><sup id="([^<]+)" class="([^<]+)"><a href="([^<]+)"><span>[<\/span>1<span>]<\/span><\/a><\/sup><\/td>
<td><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/td>
<td><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/td>
<td>([^<]+)<\/td>
<td>([^<]+)<\/td>
</tr>
here is sample html
<tr>
<td><i><a href="/wiki/3Xtreme" title="3Xtreme">3Xtreme</a></i><sup id="cite_ref-18" class="reference"><a href="#cite_note-18"><span>[</span>18<span>]</span></a></sup></td>
<td><a href="/wiki/989_Studios" title="989 Studios">989 Studios</a></td>
<td><a href="/wiki/989_Studios" title="989 Studios">989 Studios</a></td>
<td>1999-03-31<sup>NA</sup></td>
<td>NA</td>
</tr>
As of now i just want to get the data to find matches.. Can you see any reason why it would not match this?
for all the haters.... I dont care about your options on if i should use regex on html or not.. For this case it will work great. I have one page , the data i need is in a table. Once i can get the data i will save it to my db and never have to use the regex again.. Soooo if your comment or answer is about your option on using regex with html.. dont post.
Upvotes: 0
Views: 63
Reputation: 8293
...Second line:
<td>([^<]+)<i>
cannot hope to match:
<td><i>
as you put a '+' equivalent to '{1,}' while there is nothing between your tags. Didn't check the rest of your regex, but anyway it can't work.
Edit:
Please also correct the "([^<]+)"
and so on (I hope you see why)... And edit your regex when you correct it.
Edit 2:
Seeing as it's quite a disaster (sorry but it's the truth :/): please consider replacing all your ([^<]+) things that won't work for all your cases by a simple (.*?)
Edit 3:
[ and ] must be escaped. (\d will help you catch numbers)
Upvotes: 1
Reputation: 1281
<span>[<\/span>1<span>]<\/span>
Lots of problems here: you must escape the brackets and obviously 1 won't match 18
Upvotes: 1