Reputation: 3658
My goal is to select strings like hello_kitty.dat
from Lorem 'hello_kitty.dat' ipsum.
.
I have written this snippet that works to some extent for smaller strings
(from teststring
select one or more (+
) word characters (\w
) before a dot (\.
) with three word characters after that (\w{3}
) and substitute selection with x
).
>>> teststring = "Lorem 'hello_kitty.dat' ipsum."
>>> print(re.sub(r'\w+\.\w{3}', "x", teststring))
"Lorem 'x' ipsum."
But how would I modify the code to select everything between single quotes even if that does not follow my pattern completely after the \w{3}
?
teststring
could be
"Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
but wouldn't want to select hello_kitty.cmd?command92
in this case since its outside of single quotes.
Upvotes: 2
Views: 2286
Reputation: 43159
To put my two cents in, you could use:
'[^']+' # quotes with a negated character class in between
Python
would be:
import re
string = """
"Lorem 'hello_kitty.dat' ipsum."
"Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
"""
rx = re.compile(r"'[^']+'")
string = rx.sub("x", string)
print(string)
# "Lorem x ipsum."
# "Lorem x ipsum hello_kitty.cmd?command92"
Upvotes: 1
Reputation: 1365
Simply use a non-greedy regular expression:
import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'.*?'", "'x'", teststring)
Returns Lorem 'x' ipsum hello_kitty.cmd?command9
The regular expression '.*?'
matches everything between single quotes, but takes the shortest possible string.
Upvotes: 0
Reputation: 627020
You may use:
import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'\w+\.\w{3}[^']*'", "'x'", teststring))
# => Lorem 'x' ipsum hello_kitty.cmd?command92
See the Python demo
The pattern now matches:
'
- a single quote\w+
- 1 or more word chars\.
- a dot\w{3}
- 3 word chars[^']*
- a negated character class matching any 0+ chars other than a single quote'
- a single quote.Upvotes: 1