Marc Adler
Marc Adler

Reputation: 534

Python regex - strip out beginning and end and leave middle untouched

I need to strip out (i.e., substitute with nothing) everything in a series of filenames on either side of a numeral in the middle. I can do it in two steps, but I would like to do it in one.

Two steps:

filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"

front_gone = re.sub(r'(\w{3})_(\d{6})_', '', filename)

both_gone = re.sub(r'_NRCE_KR.pdf', '', front_gone)

This will result in just XL3213456321 remaining, which is what I need. I would like to do this in one step.

Upvotes: 2

Views: 149

Answers (1)

Try:

import re
filename = "NRC_401653_XL3213456321_NRCE_KR.pdf"
print re.sub(r"\w{3}_\d+_(\w+)_NRCE_KR\.pdf", r"\1", filename)

Output:

XL3213456321

(\w+) will extract a matching group (number 1). Then you want to replace the whole string into just the middle code so that's why you just need to pass \1 as the replacement.

Upvotes: 1

Related Questions