Reputation: 1035
I wanna find 6 digit in my webpage:
<td style="width:40px;">705214</td>
My code is:
s = f.read()
m = re.search(r'\A>\d{6}\Z<', s)
l = m.group(0)
Upvotes: 2
Views: 149
Reputation: 840
I think you want something like this:
m = re.search(r'>(\d{6})<', s)
l = m.group(1)
The ( ) around \d{6}
indicate a subgroup of the result.
If you want to find multiple instances of 6-digit substrings between >
and <
then try this:
s = '<tag1>111111</tag1> <tag2>222222</tag2>'
m = re.findall(r'>(\d{6})<', s)
In this case, m
will be ['111111','222222']
.
Upvotes: 1
Reputation: 183
You may want to check for any whitespace (tabs, space, newlines) between the tags. \s* means zero or more whitespace.
s='<td style="width:40px;">\n\n705214\t\n</td>'
m=re.search(r'>\s*(\d{6})\s*<',s)
m.groups()
('705214',)
Parsing HTML is a blast. Usually you treat the file as one long line, remove leading and trailing whitespace between the values contained inside the tags. Maybe looking into a HTML table parsing module may help, especially if you need to parse several columns.
stackoverflow answer using lxml etree Also, htmp.parser was suggested. Food for thought. (Still learning what modules python has to offer :) )
Upvotes: 1
Reputation: 13356
You can also use a look-ahead and a look-behind for the checking:
m = re.search(r'(?<=>)\d{6}(?=<)', s)
l = m.group(0)
This regex will match to 6 digits that are preceded by a >
and followed by a <
.
Upvotes: 1
Reputation: 78630
If you just want to find 6 digits in between a >
and <
symbol, use the following regex:
import re
s = '<td style="width:40px;">705214</td>'
m = re.search(r'>(\d{6})<', s)
l = m.groups()[0]
Note the use of parentheses (
and )
to denote a capturing group.
Upvotes: 2