Reputation:
I have a regex pattern -\w{8}-
to retrieve certain data but I only want to receive the first occurrence of this on every line
The data I have is
access-list xxx line 328 extended permit object-group RLS-Test-5ee67f4d-service-ports
access-list xxx line 329 extended permit object-group WEB-564dcfd5-service-ports123-
access-list xxx line 330 extended permit object-group WEB-564dcfd5-service-ports
access-list xxx line 331 extended permit object-group WEB-564dcfd5-service-ports-openwire-2
access-list xxx line 332 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-1
access-list xxx line 333 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-2
access-list xxx line 334 extended permit object-group SQL-85145d21-web-ports
access-list xxx line 335 extended permit object-group SQL-85145d21-open-access
access-list xxx line 336 extended permit object-group SQl-85145d21-open-access
I want it to return:
-5ee67f4d-
-564dcfd5-
-73d6bba4-
-85145d21-
But if I use the regex -\w{8}-
it will also select other words of 8 characters between hyphens further along the string. See https://regex101.com/r/1rS2Gl/4/
Is there a way I can only select the first per line? I have tried other methods but seem to result in only returning the first line and omitting the rest of the data.
Upvotes: 2
Views: 66
Reputation: 626835
Read the text or file line by line and use re.search
to find the first occurrence on each line, and collect Group 1 values:
import re
data = """access-list xxx line 329 extended permit object-group WEB-564dcfd5-service-ports123-
access-list xxx line 330 extended permit object-group WEB-564dcfd5-service-ports
access-list xxx line 331 extended permit object-group WEB-564dcfd5-service-ports-openwire-2
access-list xxx line 332 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-1
access-list xxx line 333 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-2
access-list xxx line 334 extended permit object-group SQL-85145d21-web-ports
access-list xxx line 335 extended permit object-group SQL-85145d21-open-access
access-list xxx line 336 extended permit object-group SQl-85145d21-open-access"""
results = []
rx = re.compile(r'-(\w{8})-')
for line in data.splitlines():
m = rx.search(line)
if m:
results.append(m.group(1))
# Use results.append(m.group()) if you really need to include hyphens
print(results)
# => ['564dcfd5', '564dcfd5', '564dcfd5', '73d6bba4', '73d6bba4', '85145d21', '85145d21', '85145d21']
See the Python demo (and this demo outputs matches with hyphens).
Upvotes: 1
Reputation: 823
It looks like your strings have the same character set as a standard GUID. Try -[abcdef0-9]{8}-
to limit it to hex digits.
Upvotes: 1