Reputation: 5345
I'd like to find all numbers appearing in a large string. The matching numbers need to be followed by 平方米
and the string above the line which would hold the number has to match 土地面积:
:
<tr>
<th>土地面积:</th>
<td>10000平方米</td>
</tr>
How can I do this with regex in Python?
Upvotes: 0
Views: 34
Reputation: 26014
You can use the pattern:
(?<=土地面积:<\/th>\n<td>)\d+(?=平方米)
(?<=土地面积:<\/th>\n<td>)
Lookbehind for literal substring 土地面积:
, followed by </th>
, followed by newline and <td>
.\d+
Matches digits.(?=平方米)
Positive lookahead for 平方米
substring.Regex demo here.
In Python:
import re
mystr = '''
<tr>
<th>土地面积:</th>
<td>10000平方米</td>
</tr>
'''
print(re.findall(r'(?<=土地面积:<\/th>\n<td>)\d+(?=平方米)',mystr))
Prints:
['10000']
Upvotes: 2