Reputation: 303
I'm trying to parse a text file and extract certain integers out of it. Each line in my text file is of this format:
a and b
where a is an integer and b could be a float or an integer
eg. '4 and 10.2356' or '400 and 25'
I need to extract both a and b. I'm trying to use re.findall() to do this:
print re.findall("\d+", txt)[0] #extract a
#Extract b
try:
print float(re.findall("\d+.\d+", txt)[1])
except IndexError:
print float(re.findall("\d+.\d+", txt)[0])
here txt is a single line from the file. The reason for the try and except block is as follows:
if a is a single digit integer, eg. 4, the try part of the code just returns b. However, if a is not a single digit integer, eg. 400, the try part of the code returns both a and b. I found this weird.
However, I don't know how to modify the above code to extract b when it is an integer. I tried putting another try and except bock inside the existing except block, but it gave me weird results (in some instances a and b got concatenated). Please help me out.
Also, can anyone please tell me the difference between \d+ and \d+.\d+ and why \d+.\d+ returns 400 and not 4 even when both are integers.
Upvotes: 1
Views: 3173
Reputation: 174696
Just make the pattern which matches as decimal part as optional.
>>> s = '4 and 10.2356'
>>> re.findall(r'\d+(?:\.\d+)?', s)
['4', '10.2356']
>>> print(int(re.findall(r'\d+(?:\.\d+)?', s)[0]))
4
>>> print(float(re.findall(r'\d+(?:\.\d+)?', s)[1]))
10.2356
\d+
matches one or more digits.\d+.\d+
matches one or more digits plus any single character plus one or more digits.\d+\.\d+
matches one or more digit characters pus a literal dot plus one or more digits. \d+(?:\.\d+)?
matches integer as well as floating point numbers because we made the pattern which matches the decimal part as optional. ?
after a capturing or non-capturing group would turn the whole group to an optional one.Upvotes: 2