Praveen Kumar S
Praveen Kumar S

Reputation: 31

How do I write a regex to match the pattern "Day Month Date TimeStamp Year"?

I want to filter log data using python regex based on the format

"Day Month Date Timestamp Year" 
Example : Mon Mar 16 13:03:07 2020

Content of log file will look something like below

SR 194
1584363914  0   1   Mon Mar 16 13:05:14 2020    200002305   4
    bay18   cupMonitor
ssListProcessing.c      980

The precursor msgView, required for awStart was not found, so it will be removed.

EN 194

I want to match the the datetimestamp using regex and display the log associated with it.

However I'm able to match only "timestamp year" i.e 13:05:14 2020 part using the following regex

([0-1]?\d|2[0-3])(?::([0-5]?\d))?(?::([0-5]?\d)) \b(19|20)\d{2}\b

I need to match the entire format Mon Mar 16 13:05:14 2020

Upvotes: 0

Views: 442

Answers (1)

Pierre D
Pierre D

Reputation: 26261

You can build a regex matcher as follow:

import datetime
import re

months = '|'.join([f'{datetime.datetime(1970,i+1,1):%b}' for i in range(12)])
weekdays =  '|'.join([f'{datetime.datetime(1970,1,i+1):%a}' for i in range(7)])
pat = re.compile(fr'\b({weekdays})\s+({months})\s(\d\d)\s+(\d\d:\d\d:\d\d)\s+(\d{{4}})\b')

The pattern of the matcher is:

>>> pat
re.compile(r'\b(Thu|Fri|Sat|Sun|Mon|Tue|Wed)\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s(\d\d)\s+(\d\d:\d\d:\d\d)\s+(\d{4})\b',
           re.UNICODE)

You can also get a datetime from the matched groups:

def get_datetime(m):
    return datetime.datetime.strptime(
        ' '.join(m.groups()[1:]), '%b %d %H:%M:%S %Y')

Example usage:

txt = """
SR 194
1584363914  0   1   Mon Mar 16 13:05:14 2020    200002305   4
    bay18   cupMonitor
ssListProcessing.c      980

The precursor msgView, required for awStart was not found, so it will be removed.

EN 194
"""

for s in txt.splitlines():
    if m := pat.search(s):
        t = get_datetime(m)
        print(f'{t:%Y-%m-%d %H:%M:%S} - {s}')

Output:

2020-03-16 13:05:14 - 1584363914  0   1   Mon Mar 16 13:05:14 2020    200002305   4

Upvotes: 3

Related Questions