Python regex - strip out beginning and end and leave middle untouched

Question

I need to strip out (i.e., substitute with nothing) everything in a series of filenames on either side of a numeral in the middle. I can do it in two steps, but I would like to do it in one.

Two steps:

filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"

front_gone = re.sub(r'(\w{3})_(\d{6})_', '', filename)

both_gone = re.sub(r'_NRCE_KR.pdf', '', front_gone)

This will result in just XL3213456321 remaining, which is what I need. I would like to do this in one step.

Andr&#233;s P&#233;rez-Albela H. · Accepted Answer

Try:

import re
filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"
print re.sub(r"\w{3}_\d+_(\w+)_NRCE_KR\.pdf", r"\1", filename)

Output:

XL3213456321

(\w+) will extract a matching group (number 1). Then you want to replace the whole string into just the middle code so that's why you just need to pass \1 as the replacement.

Python regex - strip out beginning and end and leave middle untouched

Answers (1)

Related Questions