Reputation: 489
I have a text where digits appear in every possible way. For example,
text = "hello23 the2e are 13 5.12apples *specially_x00123 named 31st"
I want to replace all digits with '#'s except the ones in a special pattern starting with *, a word, underscore, any character, and number such that *\w+_[a-z]\d+ (i.e., *specially_x00123).
I've tried to use lookaround syntax and non-capturing group but can't find a way to exactly change this to as below
text_cleaned = "hello## the#e are ## #.##apples *specially_x00123 named ##st"
I can use a pattern like below:
p1 = r'\d(?<!\*\w+_\w+)'
Then, it complains like this; "look-behind requires fixed-width pattern"
I tried to use non-capturing group:
p2 = r'(?:\*[a-z]+_\w+)\b|\d'
It takes out the special token (*specially_x000123) and all the digits. I think this is something that I may include in the solution, but I can't find how. Any ideas?
Upvotes: 1
Views: 1235
Reputation: 163362
What you might do is capture the digit in a capturing group (\d)
and use a callback in the replacement checking for the first capturing group.
If it is group 1, replace with a #
, else return the match.
As \w+
also matches an underscore, you might match a word char except the underscore first using a negeated character class [^\W_\n]+
\*[^\W_\n]+_[a-z]\d+\b|(\d)
import re
text = "hello23 the2e are 13 5.12apples *specially_x00123 named 31st"
pattern = r"\*[^\W_\n]+_[a-z]\d+\b|(\d)"
print (re.sub(pattern, lambda x: "#" if x.group(1) else x.group(), text))
Result
hello## the#e are ## #.##apples *specially_x00123 named ##st
Upvotes: 2
Reputation: 27723
One option might be that we split our string to before star and then after that. The expression (\d)
captures every digits before star, which we can simply replace using #
, then we will be joining it with $2
:
(\d)|(\*.*)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(\d)|(\*.*)"
test_str = ("hello23 the2e are 13 5.12apples *specially_x00123 named\n\n"
"hello## the#e are ## #.##apples *specially_x00123 named")
subst = "#\\2"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
const regex = /(\d)|(\*.*)/gm;
const str = `hello23 the2e are 13 5.12apples *specially_x00123 named`;
const subst = `#$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Upvotes: 0