greencode
greencode

Reputation: 57

re.findall with or logic

I am getting a list of 2 items when using an '|' in regex findall, which one of them is blank

I tried changing the regex format a few times but nothing worked. This is what I have so far after trying different variations:

example filenames:

231_HELLO_01.jpg
01_HELLO_WORLD.jpg
HELLO_01_WORLD.jpg

Code

    pattern = '_(\d{2}).?|^(\d{2})_'
    finddupe = re.findall(pattern, filename)

output looks like this

[('01', '')]
[('02', '')]
[('01', '')]
[('02', '')]
[('01', '')]
[('02', '')]
[('03', '')]
[('04', '')]
[('05', '')]
[('06', '')]
[('07', '')]
[]

I am just looking to get the number without the empty strings and lists.

Looking for:

01
02
01
03
04

Upvotes: 1

Views: 1195

Answers (2)

brunoluvizotto
brunoluvizotto

Reputation: 95

Ok, I can't tell if it is going to cover all your data, but you can try the following:

names = ["231_HELLO_01.jpg", "01_HELLO_WORLD.jpg", "HELLO_01_WORLD.jpg"]

result = re.findall("[^\d](\d{2})[^\d]", ' '.join(names))

The value of result after running it is:

>>> result
['01', '01', '01']

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

You may remove .? in one of the alternatives as it does not impact matching and concat group values upon a match:

import re
pattern = re.compile('^(\d{2})_|_(\d{2})')
m = pattern.search('12_text')
finddupe = ""
if m:
    finddupe = f"{m.group(1) or ''}{m.group(2) or ''}"
    # finddupe = "{}{}".format(m.group(1) or '', m.group(2) or '') # for Python versions not supporting interpolation
print(finddupe)

See the Python demo

I see you need to get the first match in each string, thus, there is no point using re.findall that returns all, multiple matches, re.search should suffice.

Upvotes: 1

Related Questions