CS1999
CS1999

Reputation: 429

Custom pattern matching in python

I am trying to write a simple python program to read a log file and extract specific values I have the following log line I want to look out for

2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269

I have many topics such as myTopic2, myTopic3 etc

I want to be able to detect all such lines which show the total incoming bytes for various topics and extract the value. Is there any easy and efficient way to do so ? basically I want to be able to detect the following pattern

2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.${}.TotalIncomingBytes.Count, value=${}

Ignoring the timestamp ofcourse

Upvotes: 0

Views: 44

Answers (2)

Ouroborus
Ouroborus

Reputation: 16865

Maybe something like this:

resultLines = []
resultSums = {}
with open('recent.logs') as f:
    for idx, line in enumerate(f):
        pieces = line.rsplit('.TotalIncomingBytes.Count, value=', 1)
        if len(pieces) != 2: continue

        value = pieces[1]

        pieces = pieces[0].rsplit(' [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.', 1)
        if len(pieces) != 2: continue

        topic = pieces[1]
        value = int(value)

        resultLines.append({
            'idx': idx,
            'line': line,
            'topic': topic,
            'value': value,
        })

        if topic not in resultSums:
            resultSums[topic] = 0
        resultSums[topic] = resultSums[topic] + value

for topic, value in resultSums.iteritems():
    print(topic, value)

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54678

Here's the way I would do it. This could also be done with a regular expression.

data = """\
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
"""

counts = {}

for line in data.splitlines():
    if '[INFO ] metrics' in line:
        parts = line.split(' - ')
        parts = parts[1].split(', ')
        dct = {}
        for part in parts:
            key,val = part.split('=')
            dct[key] = val
        if dct['name'] not in counts:
            counts[dct['name']] = int(dct['value'])
        else:
            counts[dct['name']] += int(dct['value'])

print(counts)

Output:

{'Topic.myTopic1.TotalIncomingBytes.Count': 62175807}

Here's a regex version:


pattern = re.compile(r".* - type=([^,]*), name=([^,]*), value=([^,]*)")
counts = {}

for line in data.splitlines():
    if '[INFO ] metrics' in line:
        parts = pattern.match(line)
        if parts[2] not in counts:
            counts[parts[2]] = int(parts[3])
        else:
            counts[parts[2]] += int(parts[3])

print(counts)

Upvotes: 0

Related Questions