dvrs
dvrs

Reputation: 93

A correct python regexp returns NoneType

I'm trying to get some substrings from text.

Using https://pythex.org/ to check my regular expression

pythex.org shows that it's everything correct with my regexp, but when I try to use it into my code second regexp doesn't work and re returns

AttributeError: 'NoneType' object has no attribute 'group'

I want to print uri variable. Only timestamp is returned. Example of code:

import re
line = "2019-01-30 01:05:26.255595500 tracker uri='/tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228' referer='https://instagram.com' ua='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0'"

timestamp = re.match("\d+-\d+-\d+.\d+:.\d+:.\d+.\d+", line)
if timestamp:
    print(timestamp.group(0))
uri = re.match("(?<=uri=\').+(?=\' ref)", line)
if uri:
    print(uri.group(0))

Any help would be appreciated!

Upvotes: 3

Views: 1168

Answers (1)

benvc
benvc

Reputation: 15120

re.match only returns a match object if the beginning of the string matches the regex pattern, which is why you are successfully matching the timestamp at the beginning of the string but not matching the uri string.

Use re.search instead to return a match object for the first location in the string where the regex pattern matches.

For example:

import re

line = "2019-01-30 01:05:26.255595500 tracker uri='/tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228' referer='https://instagram.com' ua='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0'"

uri = re.search(r"(?<=uri=\').+(?=\' ref)", line)

print(uri.group(0))
# OUTPUT
# /tracker_log/?f=__lxGc__&step=1&ses_id=2yz65vcsg0k8zk1952295510&id=123123&type=ad&rt=952301228

Upvotes: 4

Related Questions