Patrick
Patrick

Reputation: 2208

Python regex extract number from string

I would like to extract a number from a large html file with python. My idea was to use regex like this:

import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    found = ''

found

But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125 from:

(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

Is there an other way to extract the number or could some one help me with the regex?

Upvotes: 2

Views: 1081

Answers (2)

Thm Lee
Thm Lee

Reputation: 1236

If you want output 0,54125(or \d+,\d+), then you need to set some conditions for the output.

From the following input,

 (...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

If you want to extract 0,54125, it seems you can try several regexs like follows,

(?<=\>)\d+,\d+

Demo

or,

(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+

Demo

, etc..

Upvotes: 1

Chen A.
Chen A.

Reputation: 11280

You can replace some characters in your text before searching it. For example, to capture numbers like 12,34 you can do this:

text = 'gfgfdAAA12,34ZZZuijjk'
try:
    text = text.replace(',', '')
    found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
    found = ''

print found
# 1234

If you need to capture the digits inside a line, you can make your pattern more general, like this:

text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)

print found
# 054125

Upvotes: 1

Related Questions