Dmitry
Dmitry

Reputation: 11

Log file management with python

I got a file that has lots of different events from some service, I want to break those events in to different lines, and remove some "words & elements" Example of log file:

"Event1":{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},

As you see they all start with "EventX", At the end I want to see:

{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"}
{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"}
{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},

As you see "EventX": and "," are removed and each event is now a new line at the file.

Just a beginner here with Python and cannot figure this one out.

Thanks

tried combining re.search & re.findall without luck, Also tried to find a way to copy only things between {} and add those later and again no luck here.

Upvotes: 0

Views: 99

Answers (1)

AirSquid
AirSquid

Reputation: 11883

This construct below works and makes a list of dictionaries from your data. You could smash down some of this syntax with list or dictionary comprehensions, but it isn't needed.

If you are having trouble with testing the regex expressions, this site is invaluable.

Code

import regex as re

data = '''"Event1":{"Time":"2022-12-16 16:04:16","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"[email protected]","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},'''

splitter = r'"Event\d+":{(.*?)}'  # a search pattern to capture the stuff in braces

# tokenize the data source...
tokens = re.findall(splitter, data)

#print(tokens)


# now we can operate on the tokens and split them up into key-value pairs and put them into a list
result = []
for token in tokens:
    # make an empty dictionary to hold the row elements
    line_dict = {}
    # we can split the line (token) by comma to get the key-value pairs
    pairs = token.split(',')
    for pair in pairs:
        # another regex split needed here, because the timestamps have colons too
        splitter = r'"(.*)"\s*:\s*"(.*)"'    # capture two groups of things in quotes on opposite sides of colon
        parts = re.search(splitter, pair)
        key, value = parts.group(1), parts.group(2)
        line_dict[key] = value
    # add the dictionary of line elements to the result
    result.append(line_dict)

for d in result:
    print(d)

Output:

{'Time': '2022-12-16 16:04:16', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action1', 'Data': 'Datahere'}
{'Time': '2022-12-16 16:03:59', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action2', 'Data': 'Datahere'}
{'Time': '2022-12-16 15:54:56', 'Username': '[email protected]', 'IP_Address': '1.1.1.1', 'Action': 'Action3', 'Data': 'Datahere'}

=========

Edit:

If you are having trouble getting the data out of the file, try something like this (and experiment...it isn't clear exactly how your file is formatted/linebreaks, etc.

f_name = 'logfile.txt'

# use a context manager (look it up)
with open(f_name, 'r') as src:
    data = src.readlines()

# check it!
print(data)

Upvotes: 1

Related Questions