Fabian
Fabian

Reputation: 23

Inserting a string inside of a regex match

I need to insert the string Special:MyLanguage/ into strings like [[ACBDEF]] so it becomes [[Special:MyLanguage/ABCDEF]].

The problem is, that I need to exclude certain matches where it should not insert the Special:MyLanguage/:

So replacing \[\[ with \[\[Special:MyLanguage/ does unfortunately not work. Also, replacing \[\[[^(Special:MyLanguage|File:|Image:|Category:)] does not work, because it includes the first character (a match would be [[A). I've read a lot of tutorials, experimented around with $1 and \G and such things, but am still scratching my head.

Upvotes: 2

Views: 65

Answers (4)

Attersson
Attersson

Reputation: 4866

If you don't want to mess up with regexes, here is a simpler solution.

exclusions = ["Special:MyLanguage/:","Category:","File:","Image:"]
# repl_str = "Special:MyLanguage/:"

def replace_str(str, repl_str):
    for ex in exclusions:
        if ex in str:
            return
    str = str[:2] + repl_str + str[2:]

Provided they all follow the pattern you provided exactly: [[something]] and they are to be inserted as in your question.

For such a simple case, I find regExes overly complex, especially when using lookaheads, lookbehinds and using capture groups. Keep it simple when you can, save algorithm complexity for when it's really needed, just saying.

Upvotes: 0

Jan
Jan

Reputation: 43169

Using a function with excludes:

import re

excludes = ['Special:MyLanguage', 'Category:', 'File:', 'Image:']

s = "[[Special:MyLanguage/text]]\n[[File:text]]\n[[Image:text]]\n[[Category:text]]\n[[Text and ]]"

def analyze(match):
    for exclude in excludes:
        if exclude in match.group(1):
            return '[[{}]]'.format(match.group(1))
    return '[[Special:MyLanguage/{}]]'.format(match.group(1))

rx = re.compile(r'\[\[(.*?)\]\]')

s = rx.sub(analyze, s)
print(s)

This yields

[[Special:MyLanguage/text]]
[[File:text]]
[[Image:text]]
[[Category:text]]
[[Special:MyLanguage/Text and ]]

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can use re.sub and re.findall:

import re
tests = ['[[ACBDEF]]', '[[Special:MyLanguage/ACBDEF]]', '[[Category:ACBDEF]]', '[[File:ACBDEF]]', '[[OneLasttest]]']
def isvalid(lang):
  return not re.findall('^Special:MyLanguage/|^File|^Category|^Image', lang)

final_results = [re.sub('(?<=\[\[)[\w\W]+(?=\]\])', '{}', i).format(*['Special:MyLanguage/'+c if isvalid(c) else c for c in re.findall('(?<=\[\[)[\w\W]+(?=\]\])', i)]) for i in tests]

Output:

['[[Special:MyLanguage/ACBDEF]]', '[[Special:MyLanguage/ACBDEF]]', '[[Category:ACBDEF]]', '[[File:ACBDEF]]', '[[Special:MyLanguage/OneLasttest]]']

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may use a negative lookahead to make sure those substrings do not occur right after [[:

(\[\[)(?!Special:MyLanguage/|File:|Image:|Category:)(.*?]])

and replace with \1Special:MyLanguage/\2. See the regex demo.

Details

  • (\[\[) - Group 1: [[ substring
  • (?!Special:MyLanguage/|File:|Image:|Category:) - the [[ can't be followed with any of the substrings listed in the alternation group
  • (.*?]]) - Group 2: any 0+ chars other than line break chars, as few as possible, followed with ]].

Python demo:

import re
rx = r"(\[\[)(?!Special:MyLanguage/|File:|Image:|Category:)(.*?]])"
s = "[[Special:MyLanguage/text]]\n[[File:text]]\n[[Image:text]]\n[[Category:text]]\n[[Text and ]]"
res = re.sub(rx, r"\1Special:MyLanguage/\2", s)
print(res)

Output:

[[Special:MyLanguage/text]]
[[File:text]]
[[Image:text]]
[[Category:text]]
[[Special:MyLanguage/Text and ]]

Upvotes: 1

Related Questions