Reputation: 1
import re
filename = "access.log"
path = ""
with open (path + filename, "r") as logfile:
count = 0
for line in logfile: # Loops through the log file
regex = ('(?:(GET|POST) )(\S+)') # Stores the regex
url = re.findall(regex, line) # Uses the findall method and stores it in url variable
print(url[0][1]) # Prints out a list of URLs
This is an example of the log file
access.log
209.160.24.63 - - [01/Feb/2021:18:22:17] "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953 HTTP 1.1" 200 2550 "http://www.google.com/productid=12wdef" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 422
I got the URL in bold but I want to split it up now and store it in a dict in python.
Upvotes: 0
Views: 563
Reputation: 1
import re
filename = "access.log"
dictionary = {}
list_resources = []
count = 0
with open (filename, "r") as logfile:
for line in logfile: # Loops through the log file
regex = ('(?:(GET|POST) )(\S+)') # Stores the regex
url = re.findall(regex, line)[0][1] # Uses the findall method and stores it in url variable
list_resources.append(url)
resource = re.split("\?", url)[0]
parameters = re.split("\?", url)[1]
parameter = re.split("&", parameters)
param_dict = {}
for i in parameter:
key = re.split('=', i)[0]
value = re.split('=', i)[1]
param_dict[key] = value
dictionary[count] = {'resource': resource, 'parameters': param_dict}
count += 1
# print(list_resources)
print(dictionary)
Figured what I wanted to do, to split up the URL and store the resource and parameters in a dictionary.
Upvotes: 0
Reputation: 313
Since you already got the bolded string, you can just split it by the first whitespace that occurs in the string
s = "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953"
s.split(" ", 1)
should return
['GET', '/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953']
You can just transform the data accordingly after.
Upvotes: 0