rodlozarg
rodlozarg

Reputation: 57

Extract specific strings from a single long line

I'm trying to extract the ID from some network interfaces from a single long line containing several ID's. I already tried to use split without success. I will appreciate any help

This is a sample of the input, remember this is on a single line of text.

"Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX"

I expecting output just Gi1/0/20 Gi1/0/24 Fi1/0/10

Upvotes: 2

Views: 131

Answers (2)

ggorlen
ggorlen

Reputation: 56945

It's not entirely clear what properties define the pattern you want to extract, but here's a strict regex that matches an uppercase letter followed by a lowercase letter, a digit, a slash, another digit, then a slash and two digits. You might easily extend this to include repetitions and other characters, should they exist in the input string.

import re

s = "Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX"

print(re.findall(r"[A-Z][a-z]\d/\d/\d\d", s))

Output:

['Gi1/0/20', 'Gi1/0/24', 'Fi1/0/10']

Upvotes: 1

Will Da Silva
Will Da Silva

Reputation: 7040

Regex is suited for this task:

import re

text = 'Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX'
re.findall('Interface (.*?) ', text)

The re.findall() will return a list containing what you wanted.

['Gi1/0/20', 'Gi1/0/24', 'Fi1/0/10']

The pattern 'Interface (.*?) ' works by matching Everything beginning with the word "Interface", followed by a space, then something or nothing, then another space. That aforementioned something or nothing is represented by (.*?), which captures (i.e. it gets added to the output of re.findall()) whatever is matched by .*?, which is any character (.), any number of times (*), as few times as necessary to match (?). You can play around with regexes on sites like https://regex101.com/, which will allow you to run Python regexes, as well as explain them (better than I can).

Upvotes: 5

Related Questions