user2624464
user2624464

Reputation: 23

re.sub on a match.group

for element in f:
   galcode_scan = re.search(ur'blah\.blah\.blah\(\'\w{5,10}', element)

If I try to perform re.sub and remove the blahs with something else and keep the last bit, the \w{5,10} becomes literal. How do I retain the characters that are taken up by that chunk of the regular expression?

EDIT:

Here is the complete code

for element in f:
  galcode_scan = re.search(ur'Imgur\.Util\.triggerView\(\'\w{5,10}', element)
  galcode_scan = re.sub(r'Imgur\.Util\.triggerView\(\'\w{5,10}', 'blah\.\w{5,10}',   ur"galcode_scan\.\w{5,10}")
  print galcode_scan

Upvotes: 1

Views: 1023

Answers (2)

glglgl
glglgl

Reputation: 91017

You can as well work this way:

import re
element = "Imgur.Util.triggerView('glglgl')"
galcode_scan = re.search(ur'Imgur\.Util\.triggerView\(\'(\w{5,10})\'\)', element)

Now you have a match object which you can furtherly use: with either of

galcode_scan.expand('replacement.\\1')
galcode_scan.expand('replacement.\g<1>')

you get replacement.glglgl as a result.

This works by applying the replacement string with the captured groups.

Upvotes: 0

fejese
fejese

Reputation: 4628

You can use positive lookahead ((?=...)) to not to match when replacing but matching as a whole pattern:

re.sub("blah\.blah\.blah\(\'(?=\w{5,10})", "", "blah.blah.blah('qwertyu")

'qwertyu'

If you want to replace you match, just add it to the replacement parameter:

re.sub("blah\.blah\.blah\(\'(?=\w{5,10})", "pref:", "blah.blah.blah('qwertyu")

'pref:qwertyu'

You can also do it by capturing the pattern ((..)) and back-referencing it (\1 .. \9):

re.sub("blah\.blah\.blah\(\'(\w{5,10})", "pref:\\1", "blah.blah.blah('qwertyu")

'pref:qwertyu'

Update

A more precise pattern for the provided exmples:

re.sub("Imgur\.Util\.triggerView'(?=\w{5,10})", "imgurl.com/", "Imgur.Util.triggerView'B1ahblA4")

'imgurl.com/B1ahblA4'

The pattern here is a simple string, so whatever you need to make dynamic you can use a variable for it. For example to use different mappings:

map = {
  'Imgur\.Util\.triggerView\'': 'imgurl.com/',
  'Example\.Util\.triggerView\'': 'example.com/'
}

items = [
  "Imgur.Util.triggerView'B1ahblA4",
  "Example.Util.triggerView'FooBar"
]

for item in items:
  for old, new in map.iteritems():
    pattern = old + '(?=\w{5,10})'
    if re.match(pattern, item):
      print re.sub(pattern, new, item)

imgurl.com/B1ahblA4

example.com/FooBar

Upvotes: 1

Related Questions