Reputation: 4621
Suppose I have such a file name and I want to extract part of it as a string in Python
import re
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
rgx = re.compile('\b_[A-Z]{2}\b')
print(re.findall(rgx, fn))
Expected out put [DE]
, but actual out is []
.
Upvotes: 1
Views: 840
Reputation: 36540
You could use regular expression (re
module) for that as already shown, however this could be done without using any import
s, following way:
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
out = [i for i in fn.split('_')[1:] if len(i)==2 and i.isalpha() and i.isupper()]
print(out) # ['DE']
Explanation: I split fn
at _
then discard 1st element and filter elements so only str
s of length 2, consisting of letters and consisting of uppercases remain.
Upvotes: 0
Reputation: 37377
Try pattern: \_([^\_]+)\_[^\_\.]+\.xlsx
Explanation:
\_
- match _
literally
[^\_]+
- negated character class with +
operator: match one or more times character other than _
[^\_\.]+
- same as above, but this time match characters other than _
and .
\.xlsx
- match .xlsx
literally
The idea is to match last pattern _something_
before extension .xlsx
Upvotes: 1
Reputation: 71580
Another re
solution:
rgx = re.compile('_([A-Z]{1,})_')
print(re.findall(rgx, fn))
Upvotes: 0
Reputation: 27723
Your desired output seems to be DE
which is in bounded with two _
from left and right. This expression might also work:
# -*- coding: UTF-8 -*-
import re
string = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
expression = r'_([A-Z]+)_'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match πππ ")
else:
print('π Sorry! No matches!')
YAAAY! "DE" is a match πππ
Or you can add a 2
quantifier, if you might want:
# -*- coding: UTF-8 -*-
import re
string = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
expression = r'_([A-Z]{2})_'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match πππ ")
else:
print('π Sorry! No matches!')
Upvotes: 1
Reputation: 43169
You could use
(?<=_)[A-Z]+(?=_)
This makes use of lookarounds on both sides, see a demo on regex101.com. For tighter results, you'd need to specify more sample inputs though.
Upvotes: 2
Reputation: 82765
Use _([A-Z]{2})
Ex:
import re
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
rgx = re.compile('_([A-Z]{2})')
print(rgx.findall(fn)) #You can use the compiled pattern to do findall.
Output:
['DE']
Upvotes: 1