Reputation: 1438
I wondering if there is some way to find only second quotes from each pair in string, that has paired quotes.
So if I have string like '"aaaaa"'
or just '""'
I want to find only the last '"'
from it. If I have '"aaaa""aaaaa"aaaa""'
I want only the second, fourth and sixth '"'
s. But if I have something like this '"aaaaaaaa'
or like this 'aaa"aaa'
I don't want to find anything, since there are no paired quotes. If i have '"aaa"aaa"'
I want to find only second '"'
, since the third '"'
has no pair.
I've tried to implement lookbehind, but it doesn't work with quantifiers, so my bad attempt was '(?<=\"a*)\"'
.
Upvotes: 3
Views: 1168
Reputation: 1243
If your necessity is to change the second quote you can also match the whole string and put the pattern before the second quote into a capture group. Then making the substitution by the first match group + the substitution string would archive the issue.
For example, this regex will match everything before the second quote and put it into a group
(\"[^"]*)\"
if you replace whole the match (which includes the second quote) by only the value of the capture group (which does not include the second quote), then you would just cut it off.
See the online example
import re
p = re.compile(ur'(\"[^"]*)\"')
test_str = u"\"test1\"test2\"test3\""
subst = r"\1"
result = re.sub(p, subst, test_str)
print result #result -> "test1test2"test3
Upvotes: 1
Reputation: 12411
a parser is probably better, but depending on what you want to get out of it, there are other ways. if you need the data between the quotes:
import re
re.findall(r'".*?"', '"aaaa""aaaaa"aaaa""')
['"aaaa"',
'"aaaaa"',
'""']
if you need the indices, you could do it as a generator or other equivalent like this:
def count_quotes(mystr):
count = 0
for i, x in enumerate(mystr):
if x == '"':
count += 1
if count % 2 == 0:
yield i
list(count_quotes('"aaaa""aaaaa"aaaa""'))
[5, 12, 18]
Upvotes: 0
Reputation: 122169
You don't really need regex for this. You can do:
[i for i, c in enumerate(s) if c == '"'][1::2]
To get the index of every other '"'
. Example usage:
>>> for s in ['"aaaaa"', '"aaaa""aaaaa"aaaa""', 'aaa"aaa', '"aaa"aaa"']:
print(s, [i for i, c in enumerate(s) if c == '"'][1::2])
"aaaaa" [6]
"aaaa""aaaaa"aaaa"" [5, 12, 18]
aaa"aaa []
"aaa"aaa" [4]
Upvotes: 2
Reputation: 24802
Please read my answer about why you don't want to use regular expressions for such a problem, even though you can do that kind of non-regular job with it.
Ok then you probably want one of the solutions I give in the linked answer, where you'll want to use a recursive regex to match all the matching pairs.
Edit: the following has been written before the update to the question, which was asking only for second double quotes.
Though if you want to find only second double quotes in a string, you do not need regexps:
>>> s1='aoeu"aoeu'
>>> s2='aoeu"aoeu"aoeu'
>>> s3='aoeu"aoeu"aoeu"aoeu'
>>> def find_second_quote(s):
... pos_quote_1 = s2.find('"')
... if pos_quote_1 == -1:
... return -1
... pos_quote_2 = s[pos_quote_1+1:].find('"')
... if pos_quote_2 == -1:
... return -1
... return pos_quote_1+1+pos_quote_2
...
>>> find_second_quote(s1)
-1
>>> find_second_quote(s2)
4
>>> find_second_quote(s3)
4
>>>
here it either returns -1
if there's no second quote, or the position of the second quote if there is one.
Upvotes: 0
Reputation: 56714
import re
reg = re.compile(r'(?:\").*?(\")')
then
for match in reg.findall('"this is", "my test"'):
print(match)
gives
"
"
Upvotes: 1