blabla
blabla

Reputation: 303

Problems with using re.findall() in python

I'm trying to parse a text file and extract certain integers out of it. Each line in my text file is of this format:

a and b

where a is an integer and b could be a float or an integer

eg. '4 and 10.2356' or '400 and 25'

I need to extract both a and b. I'm trying to use re.findall() to do this:

print re.findall("\d+", txt)[0]  #extract a

#Extract b           
try:
    print float(re.findall("\d+.\d+", txt)[1])
except IndexError:
    print float(re.findall("\d+.\d+", txt)[0])

here txt is a single line from the file. The reason for the try and except block is as follows:

if a is a single digit integer, eg. 4, the try part of the code just returns b. However, if a is not a single digit integer, eg. 400, the try part of the code returns both a and b. I found this weird.

However, I don't know how to modify the above code to extract b when it is an integer. I tried putting another try and except bock inside the existing except block, but it gave me weird results (in some instances a and b got concatenated). Please help me out.

Also, can anyone please tell me the difference between \d+ and \d+.\d+ and why \d+.\d+ returns 400 and not 4 even when both are integers.

Upvotes: 1

Views: 3173

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174696

Just make the pattern which matches as decimal part as optional.

>>> s = '4 and 10.2356'
>>> re.findall(r'\d+(?:\.\d+)?', s)
['4', '10.2356']
>>> print(int(re.findall(r'\d+(?:\.\d+)?', s)[0]))
4
>>> print(float(re.findall(r'\d+(?:\.\d+)?', s)[1]))
10.2356
  • \d+ matches one or more digits.
  • \d+.\d+ matches one or more digits plus any single character plus one or more digits.
  • \d+\.\d+ matches one or more digit characters pus a literal dot plus one or more digits.
  • \d+(?:\.\d+)? matches integer as well as floating point numbers because we made the pattern which matches the decimal part as optional. ? after a capturing or non-capturing group would turn the whole group to an optional one.

Upvotes: 2

Related Questions