Clone
Clone

Reputation: 3658

Regex - select an expression between single quotes

My goal is to select strings like hello_kitty.dat from Lorem 'hello_kitty.dat' ipsum..

I have written this snippet that works to some extent for smaller strings (from teststring select one or more (+) word characters (\w) before a dot (\.) with three word characters after that (\w{3}) and substitute selection with x).

>>> teststring = "Lorem 'hello_kitty.dat' ipsum."
>>> print(re.sub(r'\w+\.\w{3}', "x", teststring))

"Lorem 'x' ipsum."

But how would I modify the code to select everything between single quotes even if that does not follow my pattern completely after the \w{3}?

teststring could be "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92" but wouldn't want to select hello_kitty.cmd?command92 in this case since its outside of single quotes.

Upvotes: 2

Views: 2286

Answers (3)

Jan
Jan

Reputation: 43159

To put my two cents in, you could use:

'[^']+' # quotes with a negated character class in between


Which in Python would be:

import re

string = """
"Lorem 'hello_kitty.dat' ipsum."
"Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
"""

rx = re.compile(r"'[^']+'")
string = rx.sub("x", string)
print(string)

# "Lorem x ipsum."
# "Lorem x ipsum hello_kitty.cmd?command92"

Upvotes: 1

acidtobi
acidtobi

Reputation: 1365

Simply use a non-greedy regular expression:

import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'.*?'", "'x'", teststring)

Returns Lorem 'x' ipsum hello_kitty.cmd?command9

The regular expression '.*?' matches everything between single quotes, but takes the shortest possible string.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627020

You may use:

import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'\w+\.\w{3}[^']*'", "'x'", teststring))
# => Lorem 'x' ipsum hello_kitty.cmd?command92

See the Python demo

The pattern now matches:

  • ' - a single quote
  • \w+ - 1 or more word chars
  • \. - a dot
  • \w{3} - 3 word chars
  • [^']* - a negated character class matching any 0+ chars other than a single quote
  • ' - a single quote.

Upvotes: 1

Related Questions