Abuvavg
Abuvavg

Reputation: 29

Extract time values from a list and add to a new list or array

I have a script that reads through a log file that contains hundreds of these logs, and looks for the ones that have a "On, Off, or Switch" type. Then I output each log into its own list. I'm trying to find a way to extract the Out and In times into a separate list/array and then subtract the two times to find the duration of each separate log. This is what the outputted logs look like:

['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']

This is my current code:

logfile = '/path/to/my/logfile'

with open(logfile, 'r') as f:
    text = f.read()
    words = ["On", "Off", "Switch"]
    text2 = text.split('\n')
    for l in text.split('\n'):
        if (words[0] in l or words[1] in l or words[2] in l):
            log = l.split(',')[0:3]

I'm stuck on how to target only the Out and In time values from the logs and put them in an array and convert to a time value to find duration.

Initial log before script: everything after the "In" time is useless for what I'm looking for so I only have the first three indices outputted

2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a","Type":"Switch,"In":"2020-01-31T00:30:20.140Z","Path":"interface","message":"interface changed status from unknown to normal","severity":"INFORMATIONAL","display":true,"json_map":"{\"severity\":null,\"eventId\":\"65e-64d9-45-ab62-8ef98ac5e60d\",\"componentPath\":\"interface_css\",\"displayToGui\":false,\"originalState\":\"unknown\",\"closed\":false,\"eventType\":\"InterfaceStateChange\",\"time\":\"2019-04-18T07:04:32.747Z\",\"json_map\":null,\"message\":\"interface_css changed status from unknown to normal\",\"newState\":\"normal\",\"info\":\"Event created with current status\"}","closed":false,"info":"Event created with current status","originalState":"unknown","newState":"normal"}

Upvotes: 1

Views: 164

Answers (2)

Patrick Artner
Patrick Artner

Reputation: 51663

Regex is probably the way to go (fastness, efficiency etc.) ... but ...

You could take a very simplistic (if very inefficient) approach of cleaning your data:

  • join all of it into a string
  • replace things that hinder easy parsing
  • split wisely and filter the split

like so:

data = ['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']

all_text = " ".join(data)


# this is inefficient and will create throwaway intermediate strings - if you are
# in a hurry or operate on 100s of MB of data, this is NOT the way to go, unless
# you have time

# iterate pairs of ("bad thing", "what to replace it with") (or list of bad things)
for thing in [ (": ",":"), (list('[]{}"'),"") ]:
    whatt = thing[0]
    withh = thing[1]

    # if list, do so for each bad thing
    if isinstance(whatt, list):
        for p in whatt:
            # replace it
            all_text = all_text.replace(p,withh)
    else:
        all_text = all_text.replace(whatt,withh)

# format is now far better suited to splitting/filtering
cleaned = [a for a in all_text.split(" ") 
           if any(a.startswith(prefix) or "Switch" in a 
                  for prefix in {"In:","Switch:","Out:"})]

print(cleaned)

Outputs:

['Out:2020-01-31T00:30:20.150Z', 'Type:Switch', 'In:2020-01-31T00:30:20.140Z']

After cleaning your data would look like:

2020-01-31T12:04:57.976Z 1234 Out:2020-01-31T00:30:20.150Z Id:Id:4-f-4-9-6a Type:Switch In:2020-01-31T00:30:20.140Z

You can transform the clean list into a dictionary for ease of lookup:

d = dict( part.split(":",1) for part in cleaned)

print(d)

will produce:

{'In': '2020-01-31T00:30:20.140Z', 
 'Type': 'Switch', 
 'Out': '2020-01-31T00:30:20.150Z'}

You can use datetime module to parse the times from your values as shown in 0 0 post.

Upvotes: 0

9769953
9769953

Reputation: 12201

Below is a possible solution. The wordmatch line is a bit of a hack, until I find something clearer: it's just a one-liner that create an empty or 1-element set of True if one of the words matches. (Untested)

import re

logfile = '/path/to/my/logfile'

words = ["On", "Off", "Switch"]
dateformat = r'\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[Zz]?'
pattern = fr'Out:\s*\[(?P<out>{dateformat})\].*In":\s*\"(?P<in>{dateformat})\"'
regex = re.compile(pattern)
with open(logfile, 'r') as f:
    for line in f:
        wordmatch = set(filter(None, (word in s for word in words)))
        if wordmatch:
            match = regex.search(line)
            if match:
                intime = match.group('in')
                outtime = match.group('out')
                # whatever to store these strings, e.g., append to list or insert in a dict.

As noted, your log example is very awkward, so this works for the example line, but may not work for every line. Adjust as necessary.

I have also not included (if so wanted), a conversion to a datetime.datetime object. For that, read through the datetime module documentation, in particular datetime.strptime. (Alternatively, you may want to store your results in a Pandas table. In that case, read through the Pandas documentation on how to convert strings to actual datetime objects.)

You also don't need to read nad split on newlines yourself: for line in f will do that for you (provided f is indeed a filehandle).

Upvotes: 1

Related Questions