Reputation: 57
I am getting a list of 2 items when using an '|' in regex findall, which one of them is blank
I tried changing the regex format a few times but nothing worked. This is what I have so far after trying different variations:
example filenames:
231_HELLO_01.jpg
01_HELLO_WORLD.jpg
HELLO_01_WORLD.jpg
Code
pattern = '_(\d{2}).?|^(\d{2})_'
finddupe = re.findall(pattern, filename)
output looks like this
[('01', '')]
[('02', '')]
[('01', '')]
[('02', '')]
[('01', '')]
[('02', '')]
[('03', '')]
[('04', '')]
[('05', '')]
[('06', '')]
[('07', '')]
[]
I am just looking to get the number without the empty strings and lists.
Looking for:
01
02
01
03
04
Upvotes: 1
Views: 1195
Reputation: 95
Ok, I can't tell if it is going to cover all your data, but you can try the following:
names = ["231_HELLO_01.jpg", "01_HELLO_WORLD.jpg", "HELLO_01_WORLD.jpg"]
result = re.findall("[^\d](\d{2})[^\d]", ' '.join(names))
The value of result after running it is:
>>> result
['01', '01', '01']
Upvotes: 0
Reputation: 627101
You may remove .?
in one of the alternatives as it does not impact matching and concat group values upon a match:
import re
pattern = re.compile('^(\d{2})_|_(\d{2})')
m = pattern.search('12_text')
finddupe = ""
if m:
finddupe = f"{m.group(1) or ''}{m.group(2) or ''}"
# finddupe = "{}{}".format(m.group(1) or '', m.group(2) or '') # for Python versions not supporting interpolation
print(finddupe)
See the Python demo
I see you need to get the first match in each string, thus, there is no point using re.findall
that returns all, multiple matches, re.search
should suffice.
Upvotes: 1