Reputation: 2208
I would like to extract a number from a large html file with python. My idea was to use regex like this:
import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
found = ''
found
But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125
from:
(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)
Is there an other way to extract the number or could some one help me with the regex?
Upvotes: 2
Views: 1081
Reputation: 1236
If you want output 0,54125
(or \d+,\d+
), then you need to set some conditions for the output.
From the following input,
(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)
If you want to extract 0,54125
, it seems you can try several regexs like follows,
(?<=\>)\d+,\d+
or,
(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+
, etc..
Upvotes: 1
Reputation: 11280
You can replace some characters in your text before searching it. For example, to capture numbers like 12,34
you can do this:
text = 'gfgfdAAA12,34ZZZuijjk'
try:
text = text.replace(',', '')
found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
found = ''
print found
# 1234
If you need to capture the digits inside a line, you can make your pattern more general, like this:
text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)
print found
# 054125
Upvotes: 1