Lucien S.
Lucien S.

Reputation: 5345

finding all number depending on the content of the previous line with regex

I'd like to find all numbers appearing in a large string. The matching numbers need to be followed by 平方米 and the string above the line which would hold the number has to match 土地面积: :

<tr>
<th>土地面积:</th>
<td>10000平方米</td>
</tr>

How can I do this with regex in Python?

Upvotes: 0

Views: 34

Answers (1)

Paolo
Paolo

Reputation: 26014

You can use the pattern:

(?<=土地面积:<\/th>\n<td>)\d+(?=平方米)
  • (?<=土地面积:<\/th>\n<td>) Lookbehind for literal substring 土地面积:, followed by </th>, followed by newline and <td>.
  • \d+ Matches digits.
  • (?=平方米) Positive lookahead for 平方米 substring.

Regex demo here.


In Python:

import re

mystr = '''
<tr>
<th>土地面积:</th>
<td>10000平方米</td>
</tr>
'''

print(re.findall(r'(?<=土地面积:<\/th>\n<td>)\d+(?=平方米)',mystr))

Prints:

['10000']

Upvotes: 2

Related Questions