walterudoing
walterudoing

Reputation: 115

Python extracting number out of text using regular expression

I have a string like this:

var hours_tdate = ['22','23','<span style="color:#1d953f;">0</span>','<span style="color:#1d953f;">1</span>'];

This is a part of a js file. Now I want to use regex to extract the numbers from the above string, and having the output like this:

[22,23,0,1]

I have tried:

re.findall('var hours_tdate = \[(.*)\];', string)

And it gives me:

'22','23','<span style="color:#1d953f;">0</span>','<span style="color:#1d953f;">1</span>'

I don't know why it has no match when I tried:

re.findall('var hours_tdate = \[(\d*)\];', string)

Upvotes: 2

Views: 74

Answers (2)

Jan
Jan

Reputation: 43169

To provide another examples:

['>](\d+)['<]
# one of ' or >
# followed by digits
# followed by one of ' or <

In Python Code:

import re
rx = r"['>](\d+)['<]"
matches = [match.group(1) for match in re.finditer(rx, string)]

Or use lookarounds to only match what you want (no additional group needed, that is):

(?<=[>'])\d+(?=[<'])

Again, in Python Code:

import re
rx = r"(?<=[>'])\d+(?=[<'])"
matches = re.findall(rx, string)

Upvotes: 0

rock321987
rock321987

Reputation: 11032

Use \d+ along with word boundary to extract the numbers only

\b\d+\b

Regex Demo

Python Code

p = re.compile(r'\b\d+\b')
test_str = "var hours_tdate = ['22','23','<span style=\"color:#1d953f;\">0</span>','<span style=\"color:#1d953f;\">1</span>'];"

print(re.findall(p, test_str))

Ideone Demo

NOTE :- Even if there will be digits in variable name, it won't matter as long as your format of variable is correct

Upvotes: 1

Related Questions