Adam S
Adam S

Reputation: 9235

Python and Regex - extracting a number from a string

I'm new to regex, and I'm starting to sort of get the hang of things. I have a string that looks like this:

This is a generated number #123 which is an integer.

The text that I've shown here around the 123 will always stay exactly the same, but it may have further text on either side. But the number may be 123, 597392, really one or more digits. I believe I can match the number and the folowing text using using \d+(?= which is an integer.), but how do I write the look-behind part?

When I try (?<=This is a generated number #)\d+(?= which is an integer.), it does not match using regexpal.com as a tester.

Also, how would I use python to get this into a variable (stored as an int)?

NOTE: I only want to find the numbers that are sandwiched in between the text I've shown. The string might be much longer with many more numbers.

Upvotes: 0

Views: 7704

Answers (4)

JBernardo
JBernardo

Reputation: 33397

You don't really need a fancy regex. Just use a group on what you want.

re.search(r'#(\d+)', 'This is a generated number #123 which is an integer.').group(1)

if you want to match a number in the middle of some known text, follow the same rule:

r'some text you know (\d+) other text you also know'

Upvotes: 2

Kent
Kent

Reputation: 195039

if you want to get the numbers only if the numbers are following text "This is a generated number #" AND followed by " which is an integer.", you don't have to do look-behind and lookahead. You can simply match the whole string, like:

"This is a generated number #(\d+) which is an integer."

I am not sure if I understood what you really want though. :)

updated

In [16]: a='This is a generated number #123 which is an integer.'                                                                        

In [17]: b='This should be a generated number #123 which could be an integer.'

In [18]: exp="This is a generated number #(\d+) which is an integer."

In [19]: result =re.search(exp, a)                                                                                                       

In [20]: int(result.group(1))
Out[20]: 123

In [21]: result = re.search(exp,b)

In [22]: result == None
Out[22]: True

Upvotes: 0

Greg Brown
Greg Brown

Reputation: 1299

You can just use the findall() in the re module.

string="This is a string that contains #134534 and other things"
match=re.findall(r'#\d+ .+',string);
print match

Output would be '#1234534 and other things'

This will match any length number #123 or #123235345 then a space then the rest of the line till it hits a newline char.

Upvotes: 0

mre666
mre666

Reputation: 336

res = re.search('#(\d+)', 'This is a generated number #123 which is an integer.')

if res is not None:
    integer = int(res.group(1))

Upvotes: 0

Related Questions