Reputation: 93
I'm trying to get some substrings from text.
Using https://pythex.org/ to check my regular expression
pythex.org shows that it's everything correct with my regexp, but when I try to use it into my code second regexp doesn't work and re returns
AttributeError: 'NoneType' object has no attribute 'group'
I want to print uri variable. Only timestamp is returned. Example of code:
import re
line = "2019-01-30 01:05:26.255595500 tracker uri='/tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228' referer='https://instagram.com' ua='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0'"
timestamp = re.match("\d+-\d+-\d+.\d+:.\d+:.\d+.\d+", line)
if timestamp:
print(timestamp.group(0))
uri = re.match("(?<=uri=\').+(?=\' ref)", line)
if uri:
print(uri.group(0))
Any help would be appreciated!
Upvotes: 3
Views: 1168
Reputation: 15120
re.match
only returns a match object if the beginning of the string matches the regex pattern, which is why you are successfully matching the timestamp at the beginning of the string but not matching the uri string.
Use re.search
instead to return a match object for the first location in the string where the regex pattern matches.
For example:
import re
line = "2019-01-30 01:05:26.255595500 tracker uri='/tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228' referer='https://instagram.com' ua='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0'"
uri = re.search(r"(?<=uri=\').+(?=\' ref)", line)
print(uri.group(0))
# OUTPUT
# /tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228
Upvotes: 4