Danis Fermi
Danis Fermi

Reputation: 611

Parsing Apache Log using Regex

I want to get the following:-

Input

GET /1.1/friendships/list.json?user_id=123 HTTP/1.1
GET /1.1/friendships/list.json HTTP/1.1
GET /1.1/users/show.json?include_entities=1&user_id=321 HTTP/1.1
GET /1.1/friendships/list.json?user_id=234 HTTP/1.1
GET /1.1/friendships/create.json HTTP/1.1

Output

/1.1/friendships/list.json
/1.1/friendships/list.json
/1.1/users/show.json
/1.1/friendships/list.json
/1.1/friendships/create.json

I have been able to match till the question mark character. I want to match a character that is either a question mark or a blank space. Here is what I have so far.

([A-Z])+ (\S)+[\?]

Upvotes: 2

Views: 1436

Answers (2)

Jan
Jan

Reputation: 43169

The following expression accepts GET and POST:

^(?:GET|POST)\s+([^?\n\r]+).*$

Broken down, this says

^               # start of line
(?:GET|POST)\s+ # GET or POST literally, at least one whitespace
([^?\s]+)       # not a question mark nor whitespace characters
.*              # 0+ chars afterwards
$               # end of line

This needs to be replaced by \1, see a demo on regex101.com and mind the MULTILINE flag.


In Python, this would be:

import re

string = """
GET /1.1/friendships/list.json?user_id=123 HTTP/1.1
GET /1.1/friendships/list.json HTTP/1.1
GET /1.1/users/show.json?include_entities=1&user_id=321 HTTP/1.1
GET /1.1/friendships/list.json?user_id=234 HTTP/1.1
GET /1.1/friendships/create.json HTTP/1.1
POST /some/other/url/here
"""

rx = re.compile(r'^(?:GET|POST)\s+([^?\s]+).*$', re.M)
matches = rx.findall(string)
print(matches)
# ['/1.1/friendships/list.json', '/1.1/friendships/list.json', '/1.1/users/show.json', '/1.1/friendships/list.json', '/1.1/friendships/create.json', '/some/other/url/here']

Upvotes: 1

CinCout
CinCout

Reputation: 9619

This should do:

GET\s*(\S*?[\?\s])

Demo

The idea is to search for ? or (space) in a non-greedy (aka lazy) approach (denoted by *?). Group 1 then has the required captured text.

Upvotes: 0

Related Questions