Reputation: 321
I want to use regex to search in a file for this expression:
time:<float> s
I only want to get the float number. I'm learning about regex, and this is what I did:
astr = 'lalala time:1.5 s\n'
p = re.compile(r'time:(\d+).*(\d+)')
m = p.search(astr)
Well, I get time:1.5
from m.group(0)
How can I directly just get 1.5
?
Upvotes: 0
Views: 141
Reputation: 3914
I'm including some extra python-specific materiel since you said you're learning regex. As already mentioned the simplest regex for this would certainly be \d+\.\d+
in various commands as described below.
Something that threw me off with python initially was getting my head around the return types of various re methods and when to use group() vs. groups().
There are several methods you might use:
match()
will only return an object if the pattern is found at the beginning of the string.
search()
will find the first pattern and top.
findall()
will find everything in the string.
The return type for match() and search() is a match object, __Match[T], or None, if a match isn't found. However the return type for findall() is a list[T]. These different return types obviously have ramifications for how you get the values out of your match.
Both match and search expose the group() and groups() methods for retrieving your matches. But when using findall you'll want to iterate through your list or pull a value with an enumerator. So using findall:
>>>import re
>>>easy = re.compile(r'123')
>>>matches = easy.findall(search_me)
>>>for match in matches: print match
123
If you're using search() or match(), you'll want to use .group() or groups() to retrieve your match depending on how you've set up your regular expression.
From the documentation, "The groups() method returns a tuple containing the strings for all the subgroups, from 1 up to however many there are."
Therefore if you have no groups in your regex, as shown in the following example, you wont get anything back:
>>>import re
>>>search_me = '123abc'
>>>easy = re.compile(r'123')
>>>matches = easy.search(search_me)
>>>print matches.groups()
()
Adding a "group" to your regular expression enables you to use this:
>>>import re
>>>search_me = '123abc'
>>>easy = re.compile(r'(123)')
>>>matches = easy.search(search_me)
>>>print matches.groups()
('123',)
You don't have to specify groups in your regex. group(0) or group() will return the entire match even if you don't have anything in parenthesis in your expression. --group() defaults to group(0).
>>>import re
>>>search_me = '123abc'
>>>easy = re.compile(r'123')
>>>matches = easy.search(search_me)
>>>print matches.group(0)
123
If you are using parenthesis you can use group to match specific groups and subgroups.
>>>import re
>>>search_me = '123abc'
>>>easy = re.compile(r'((1)(2)(3))')
>>>matches = easy.search(search_me)
>>>print matches.group(1)
>>>print matches.group(2)
>>>print matches.group(3)
>>>print matches.group(4)
123
1
2
3
I'd like to point as well that you don't have to compile your regex unless you care to for reasons of usability and/or readability. It won't improve your performance.
>>>import re
>>>search_me = '123abc'
>>>#easy = re.compile(r'123')
>>>#matches = easy.search(search_me)
>>>matches = re.search(r'123', search_me)
>>>print matches.group()
Hope this helps! I found sites like debuggex helpful while learning regex. (Although sometimes you have to refresh those pages; I was banging my head for a couple hours one night before I realized that after reloading the page my regex worked just fine.) Lately I think you're served just as well by throwing sandbox code into something like wakari.io, or an IDE like PyCharm, etc., and observing the output. http://www.rexegg.com/ is also a good site for general regex knowledge.
Upvotes: 1
Reputation: 310307
I think the regex you actually want is something more like:
re.compile(r'time:(\d+\.\d+)')
or even:
re.compile(r'time:(\d+(?:\.\d+)?)') # This one will capture integers too.
Note that I've put the entire time into 1 grouping. I've also escaped the .
which means any character in regex.
Then, you'd get 1.5
from m.group(1)
-- m.group(0)
is the entire match. m.group(1)
is the first submatch (parenthesized grouping), m.group(2)
is the second grouping, etc.
example:
>>> import re
>>> p = re.compile(r'time:(\d+(?:\.\d+)?)')
>>> p.search('time:34')
<_sre.SRE_Match object at 0x10fa77d50>
>>> p.search('time:34').group(1)
'34'
>>> p.search('time:34.55').group(1)
'34.55'
Upvotes: 0
Reputation: 55760
You could do create another group for that. And I would also change the regex slightly to allow for numbers that don't have a decimal separator.
re.compile(r'time:((\d+)(\.?(\d+))?')
Now you can use group(1)
to capture the match of the floating point number.
Upvotes: 0