user7862908
user7862908

Reputation:

How to match only first occurence of a string per line with regex

I have a regex pattern -\w{8}- to retrieve certain data but I only want to receive the first occurrence of this on every line

The data I have is

access-list xxx line 328 extended permit object-group RLS-Test-5ee67f4d-service-ports
access-list xxx line 329 extended permit object-group WEB-564dcfd5-service-ports123-
access-list xxx line 330 extended permit object-group WEB-564dcfd5-service-ports
access-list xxx line 331 extended permit object-group WEB-564dcfd5-service-ports-openwire-2
access-list xxx line 332 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-1
access-list xxx line 333 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-2
access-list xxx line 334 extended permit object-group SQL-85145d21-web-ports
access-list xxx line 335 extended permit object-group SQL-85145d21-open-access
access-list xxx line 336 extended permit object-group SQl-85145d21-open-access

I want it to return:

-5ee67f4d-
-564dcfd5-
-73d6bba4-
-85145d21-

But if I use the regex -\w{8}- it will also select other words of 8 characters between hyphens further along the string. See https://regex101.com/r/1rS2Gl/4/

Is there a way I can only select the first per line? I have tried other methods but seem to result in only returning the first line and omitting the rest of the data.

Upvotes: 2

Views: 66

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626835

Read the text or file line by line and use re.search to find the first occurrence on each line, and collect Group 1 values:

import re
data = """access-list xxx line 329 extended permit object-group WEB-564dcfd5-service-ports123-
access-list xxx line 330 extended permit object-group WEB-564dcfd5-service-ports
access-list xxx line 331 extended permit object-group WEB-564dcfd5-service-ports-openwire-2
access-list xxx line 332 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-1
access-list xxx line 333 extended permit object-group RLS-Test-2-73d6bba4-service-ports-openwire-2
access-list xxx line 334 extended permit object-group SQL-85145d21-web-ports
access-list xxx line 335 extended permit object-group SQL-85145d21-open-access
access-list xxx line 336 extended permit object-group SQl-85145d21-open-access"""
results = []

rx = re.compile(r'-(\w{8})-')

for line in data.splitlines():
    m = rx.search(line)
    if m:
        results.append(m.group(1))
        # Use results.append(m.group()) if you really need to include hyphens

print(results)
# => ['564dcfd5', '564dcfd5', '564dcfd5', '73d6bba4', '73d6bba4', '85145d21', '85145d21', '85145d21']

See the Python demo (and this demo outputs matches with hyphens).

Upvotes: 1

Adrian Roworth
Adrian Roworth

Reputation: 823

It looks like your strings have the same character set as a standard GUID. Try -[abcdef0-9]{8}- to limit it to hex digits.

Upvotes: 1

Related Questions