ShawnC
ShawnC

Reputation: 709

Use RegEx in Python to extract URL and optional query string from web server log data

Disclosure: very much a regex newbie, so I'm trying to tweak some example code I found which parses web server log data into named groups. The snippet of my modified regex thus far that deals with the URL and query string groups:

(?P<url>.+)(?P<querystr>\?.*)

This works just fine when the string against which it's applied actually does have a query string on the URL (each group gets the expected bit of the string) but fails to match if there is none. So I tried adding a '?' after the "querystr" group to indicate that it was optional, i.e. (?P<querystr>\?.*)? ... if there's no query string then it works as expected (nothing is extracted into querystr), but when there is one, it is still extracted as part of url rather than separately into querystr.

What's the best way to identify optional groups (assuming that's even the right approach in this case)? Thanks in advance.

Upvotes: 1

Views: 752

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626936

You can use

^(?P<url>[^?]+)(?P<querystr>\?.*)?$

Details

  • ^ - start of string
  • (?P<url>[^?]+) - Group "url": any one or more chars other than ?
  • (?P<querystr>\?.*)? - an optional Group "querystr": a ? char and then any zero or more chars other than line break chars as many as possible
  • $ - end of string.

See the regex demo.

Upvotes: 1

Related Questions