Reputation: 25

RegEx replace character instances based on pattern or seperator

I am just recently learning and utilizing the power of regular expressions I have a tuple list of files returned from os.walk(), like so:

files = ('s8_00.tif', 's9_00.tif', 's10_000.tif', 's11_00.tif')

I am trying to get it to look like this:

files = ('s8_##.tif', 's9_##.tif', 's10_###.tif', 's11_##.tif')

I have tried to use this.

pad2 = re.compile(r'_00?')

for root, dirs, files in seqDirs:
  pad = files[0]  
  p = pad2.sub("#", pad)
  print p

This returns:

p = ('s8#.tif', 's9#.tif', 's10#0.tif', 's11#.tif')

So I changed the expression around to:

pad2 = re.compile('(_)0+')

giving me:

p = ('s8#.tif', 's9#.tif', 's10#.tif', 's11#.tif')

Is the problem in my p = pad2.sub function? Or is the problem exist within my compiled expression? Or is it the "_" being in the expression that is screwing it up?

I tried even passing some expression inside the pad2.sub function just to test it out and of course that didn't really work. I know I am missing something little here and I am a bit stuck.

Any and all help will be greatly appreciated along with explanations of logic.

Upvotes: 1

Answers (3)

BostonJohn

Reputation: 2671

If you want to do it where any number could be there, make your regex be

pattern = re.compile("_(\d+)")

and do the substitution by

pattern.sub("_"+len("\g<1>")*"#", filename)

In any regex you can access what was caught with the parens with "\g<1>" for the first value, "\g<2>" for the next set of parens and so on. "\d+" is going to get any digit character in the expression. If you very specifically just want to look for zeros, you could replace it with "_(0+)"

Upvotes: 2

Phillip Schmidt

Reputation: 8818

You're better off finding the matches, calculating the length of them, and then replacing them with that number of #s.

Upvotes: 0

Matthias

Reputation: 13232

We're going to use a function for the replacement, not a string.

def replacer(data):
    return re.sub(r'(?<=_)(0+)', lambda m: m.group(0).replace('0', '#'), data)

files = ('s8_000.tif', 's9_00.tif', 's10_000.tif', 's11_00.tif')
map(replacer, files)
print(files)

?<= is a positive lookbehind assertion. You can find an explanation in the docs at Regular Expression Syntax.

0+ captures all following zeros

The lambda function replaces every 0 with the #.

Upvotes: 5

RegEx replace character instances based on pattern or seperator

Answers (3)

Related Questions