Reputation: 534
I need to strip out (i.e., substitute with nothing) everything in a series of filenames on either side of a numeral in the middle. I can do it in two steps, but I would like to do it in one.
Two steps:
filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"
front_gone = re.sub(r'(\w{3})_(\d{6})_', '', filename)
both_gone = re.sub(r'_NRCE_KR.pdf', '', front_gone)
This will result in just XL3213456321
remaining, which is what I need. I would like to do this in one step.
Upvotes: 2
Views: 149
Reputation: 4021
Try:
import re
filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"
print re.sub(r"\w{3}_\d+_(\w+)_NRCE_KR\.pdf", r"\1", filename)
Output:
XL3213456321
(\w+)
will extract a matching group (number 1). Then you want to replace the whole string into just the middle code so that's why you just need to pass \1 as the replacement.
Upvotes: 1