Reputation: 11
I got a file that has lots of different events from some service, I want to break those events in to different lines, and remove some "words & elements" Example of log file:
"Event1":{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},
As you see they all start with "EventX", At the end I want to see:
{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"}
{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"}
{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},
As you see "EventX": and "," are removed and each event is now a new line at the file.
Just a beginner here with Python and cannot figure this one out.
Thanks
tried combining re.search & re.findall without luck, Also tried to find a way to copy only things between {} and add those later and again no luck here.
Upvotes: 0
Views: 99
Reputation: 11883
This construct below works and makes a list of dictionaries from your data. You could smash down some of this syntax with list or dictionary comprehensions, but it isn't needed.
If you are having trouble with testing the regex
expressions, this site is invaluable.
import regex as re
data = '''"Event1":{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},'''
splitter = r'"Event\d+":{(.*?)}' # a search pattern to capture the stuff in braces
# tokenize the data source...
tokens = re.findall(splitter, data)
#print(tokens)
# now we can operate on the tokens and split them up into key-value pairs and put them into a list
result = []
for token in tokens:
# make an empty dictionary to hold the row elements
line_dict = {}
# we can split the line (token) by comma to get the key-value pairs
pairs = token.split(',')
for pair in pairs:
# another regex split needed here, because the timestamps have colons too
splitter = r'"(.*)"\s*:\s*"(.*)"' # capture two groups of things in quotes on opposite sides of colon
parts = re.search(splitter, pair)
key, value = parts.group(1), parts.group(2)
line_dict[key] = value
# add the dictionary of line elements to the result
result.append(line_dict)
for d in result:
print(d)
{'Time': '2022-12-16 16:04:16', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action1', 'Data': 'Datahere'}
{'Time': '2022-12-16 16:03:59', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action2', 'Data': 'Datahere'}
{'Time': '2022-12-16 15:54:56', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action3', 'Data': 'Datahere'}
=========
Edit:
If you are having trouble getting the data out of the file, try something like this (and experiment...it isn't clear exactly how your file is formatted/linebreaks, etc.
f_name = 'logfile.txt'
# use a context manager (look it up)
with open(f_name, 'r') as src:
data = src.readlines()
# check it!
print(data)
Upvotes: 1