MaverickD
MaverickD

Reputation: 1657

re.sub replace multiple pattern each with different word in a single call

I have a file name in this format,

remote_execute___jenkin%.java
remtoe__plat_jenk.java

I want to replace all occurance of two or three _ with single _,

which I have done like this,

re.sub('_{2,3}','_',name)

this works and replace all occurrence of two or three _ with single _. But in the same re.sub call, i need to replace .java with .jav,

I did this to match both .java and underscores,

\.java$|_{2,3}

but how can I replace .java in the same re.sub call without using another re.sub after replacing underscores,

right now i am doing it like this,

name = re.sub('_{2,3}','_',name)
name = re.sub('\.java$','jav',name)

I want to do above in one re.sub call

Upvotes: 1

Views: 5374

Answers (2)

The fourth bird
The fourth bird

Reputation: 163467

For your example data you could use:

_(?=_)|(?<=\.jav)a$

import re
name = "remote_execute___jenkin%.java"
print(re.sub('_(?=_)|(?<=\.jav)a$', "", name))

This would match

  • _(?=_) Match an underscore and use a positive lookahead to assert that what follows is an underscore which would match the the leading underscores and do not match the last in __ or ___
  • | or
  • (?<=\.jav)a$ Positive lookbehind to assert that what is on the left side is .jav and match an a at the end of the line

If the occurences of an underscore must be 2 times or 3 times you might use:

(?<!_)_{1,2}(?=_[^_])|(?<=\.jav)a$

The part that matches 2 or 3 underscores:

  • (?<!_) Use a negative lookbehind to assert that what is on the left side is not an underscore
  • _{1,2} match an underscore 1 or 2 times
  • (?=_[^_]) Positive lookahead to assert that what follows is an underscore followed by not an underscore

Demo Python

Upvotes: 6

pyeR_biz
pyeR_biz

Reputation: 1044

Nested re.sub will work

name = re.sub('_{2,3}','_',re.sub('\.java$','jav',name))

Upvotes: 1

Related Questions