Reputation: 23
I need to insert the string Special:MyLanguage/
into strings like [[ACBDEF]]
so it becomes [[Special:MyLanguage/ABCDEF]]
.
The problem is, that I need to exclude certain matches where it should not insert the Special:MyLanguage/
:
Special:MyLanguage/
orCategory:
orFile:
orImage:
So replacing \[\[
with \[\[Special:MyLanguage/
does unfortunately not work. Also, replacing \[\[[^(Special:MyLanguage|File:|Image:|Category:)]
does not work, because it includes the first character (a match would be [[A
). I've read a lot of tutorials, experimented around with $1
and \G
and such things, but am still scratching my head.
Upvotes: 2
Views: 65
Reputation: 4866
If you don't want to mess up with regexes, here is a simpler solution.
exclusions = ["Special:MyLanguage/:","Category:","File:","Image:"]
# repl_str = "Special:MyLanguage/:"
def replace_str(str, repl_str):
for ex in exclusions:
if ex in str:
return
str = str[:2] + repl_str + str[2:]
Provided they all follow the pattern you provided exactly: [[something]] and they are to be inserted as in your question.
For such a simple case, I find regExes overly complex, especially when using lookaheads, lookbehinds and using capture groups. Keep it simple when you can, save algorithm complexity for when it's really needed, just saying.
Upvotes: 0
Reputation: 43169
Using a function with excludes
:
import re
excludes = ['Special:MyLanguage', 'Category:', 'File:', 'Image:']
s = "[[Special:MyLanguage/text]]\n[[File:text]]\n[[Image:text]]\n[[Category:text]]\n[[Text and ]]"
def analyze(match):
for exclude in excludes:
if exclude in match.group(1):
return '[[{}]]'.format(match.group(1))
return '[[Special:MyLanguage/{}]]'.format(match.group(1))
rx = re.compile(r'\[\[(.*?)\]\]')
s = rx.sub(analyze, s)
print(s)
This yields
[[Special:MyLanguage/text]]
[[File:text]]
[[Image:text]]
[[Category:text]]
[[Special:MyLanguage/Text and ]]
Upvotes: 0
Reputation: 71451
You can use re.sub
and re.findall
:
import re
tests = ['[[ACBDEF]]', '[[Special:MyLanguage/ACBDEF]]', '[[Category:ACBDEF]]', '[[File:ACBDEF]]', '[[OneLasttest]]']
def isvalid(lang):
return not re.findall('^Special:MyLanguage/|^File|^Category|^Image', lang)
final_results = [re.sub('(?<=\[\[)[\w\W]+(?=\]\])', '{}', i).format(*['Special:MyLanguage/'+c if isvalid(c) else c for c in re.findall('(?<=\[\[)[\w\W]+(?=\]\])', i)]) for i in tests]
Output:
['[[Special:MyLanguage/ACBDEF]]', '[[Special:MyLanguage/ACBDEF]]', '[[Category:ACBDEF]]', '[[File:ACBDEF]]', '[[Special:MyLanguage/OneLasttest]]']
Upvotes: 0
Reputation: 626747
You may use a negative lookahead to make sure those substrings do not occur right after [[
:
(\[\[)(?!Special:MyLanguage/|File:|Image:|Category:)(.*?]])
and replace with \1Special:MyLanguage/\2
. See the regex demo.
Details
(\[\[)
- Group 1: [[
substring(?!Special:MyLanguage/|File:|Image:|Category:)
- the [[
can't be followed with any of the substrings listed in the alternation group(.*?]])
- Group 2: any 0+ chars other than line break chars, as few as possible, followed with ]]
.import re
rx = r"(\[\[)(?!Special:MyLanguage/|File:|Image:|Category:)(.*?]])"
s = "[[Special:MyLanguage/text]]\n[[File:text]]\n[[Image:text]]\n[[Category:text]]\n[[Text and ]]"
res = re.sub(rx, r"\1Special:MyLanguage/\2", s)
print(res)
Output:
[[Special:MyLanguage/text]]
[[File:text]]
[[Image:text]]
[[Category:text]]
[[Special:MyLanguage/Text and ]]
Upvotes: 1