Programmer101
Programmer101

Reputation: 1

Regex to get URL from log file and store it in a dictionary in Python

import re

filename = "access.log"

path = ""

with open (path + filename, "r") as logfile:
  count = 0
  for line in logfile:                            # Loops through the log file
    regex = ('(?:(GET|POST) )(\S+)')              # Stores the regex
    url = re.findall(regex, line)                 # Uses the findall method and stores it in url variable
    print(url[0][1])                              # Prints out a list of URLs

This is an example of the log file

access.log

209.160.24.63 - - [01/Feb/2021:18:22:17] "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953 HTTP 1.1" 200 2550 "http://www.google.com/productid=12wdef" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 422

I got the URL in bold but I want to split it up now and store it in a dict in python.

Upvotes: 0

Views: 563

Answers (2)

Programmer101
Programmer101

Reputation: 1

import re

filename = "access.log"
dictionary = {}
list_resources = []
count = 0

with open (filename, "r") as logfile:

  for line in logfile:                            # Loops through the log file
    regex = ('(?:(GET|POST) )(\S+)')              # Stores the regex
    url = re.findall(regex, line)[0][1]           # Uses the findall method and stores it in url variable
    list_resources.append(url)
          
    resource = re.split("\?", url)[0]
    parameters = re.split("\?", url)[1]

    parameter = re.split("&", parameters)
    param_dict = {}

    for i in parameter:
      key = re.split('=', i)[0]
      value = re.split('=', i)[1]
      param_dict[key] = value

    dictionary[count] = {'resource': resource, 'parameters': param_dict}
    count += 1

# print(list_resources)

print(dictionary)

Figured what I wanted to do, to split up the URL and store the resource and parameters in a dictionary.

Upvotes: 0

Niraeth
Niraeth

Reputation: 313

Since you already got the bolded string, you can just split it by the first whitespace that occurs in the string

s = "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953"
s.split(" ", 1)

should return

['GET', '/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953']

You can just transform the data accordingly after.

Upvotes: 0

Related Questions