Ricky Ng
Ricky Ng

Reputation: 39

regular expression of python

I am struggling when writing regular expression in python. For instance I get the following right

"GET /images/launch-logo.gif HTTP/1.0" 220 1839

is matched by

"(\S+) (\S+)\s*(\S*)" (\d{3}) (\S+)

however I still need to include the following cases all together

  1. "GET /history/history.html hqpao/hqpao_home.html HTTP/1.0" 200 1502
  2. "GET /shuttle/missions/missions.html Shuttle Launches from Kennedy Space Center HTTP/1.0"200 8677
  3. "GET /finger @net.com HTTP/1.0"404 -

obviously I should change the bold part of the expression

"(\S+) (\S+)\s*(\S*)" (\d{3}) (\S+)

But how should I change it. I have one approach in mind which is change the bold part to

[\s |(\s*)(\S+) |(\S+)(12) |(\S+)]

where the 2nd, 3rd , 4th expression is the (1), (2), (3) extra cases I need to deal with.

But my expression do not work. What do I misunderstand about regular expression as I simply deal with it case by case.

Upvotes: 2

Views: 101

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

You may use

^"([^\s"]+)\s+([^\s"]+)(?:\s+([^"]+?))?\s+([A-Z]+/\d[\d.]*)"\s*(\d{3})\s*(\S+)$

See the regex demo

Details

  • ^ - start of a line (use re.M if you are reading the whole file into a variable, f.read())
  • " - a double quotation mark
  • ([^\s"]+) - Group 1: one or more chars other than whitespace and a double quotation mark
  • \s+ - 1+ whitespaces
  • ([^\s"]+) - Group 2: one or more chars other than whitespace and a double quotation mark
  • (?:\s+([^"]+?))? - an optional non-capturing group matching
    • \s+ - 1+ whitespaces
    • ([^"]+?) - Group 3: any 1 or more chars other than ", as few as possible
  • \s+ - 1+ whitespaces
  • ([A-Z]+/\d[\d.]*) - Group 4: 1+ uppercase letters, / and then 1 digit followed with any 0+ digits or . chars
  • " - a double quotation mark
  • \s+ - 0+ whitespaces
  • (\d{3}) - Group 5: three digits
  • \s* - 0+ whitespaces
  • (\S+) - 1 or more non-whitespace chars
  • $ - end of string.

Upvotes: 0

Dani G
Dani G

Reputation: 1252

This Might be a bit messy but it works:

\"(\S+) (\S+[\s\w\.\@]*)\s*(\S*)\"\s?(\d{3})\s(\S+)*

You can play with it on Regexr. Regexr Shared Link

Upvotes: 1

Related Questions