Reputation: 1438

Python regex to find only second quotes of paired quotes

I wondering if there is some way to find only second quotes from each pair in string, that has paired quotes.

So if I have string like '"aaaaa"' or just '""' I want to find only the last '"' from it. If I have '"aaaa""aaaaa"aaaa""' I want only the second, fourth and sixth '"'s. But if I have something like this '"aaaaaaaa' or like this 'aaa"aaa' I don't want to find anything, since there are no paired quotes. If i have '"aaa"aaa"' I want to find only second '"', since the third '"' has no pair.

I've tried to implement lookbehind, but it doesn't work with quantifiers, so my bad attempt was '(?<=\"a*)\"'.

Upvotes: 3

Answers (5)

Caio Oliveira

Reputation: 1243

If your necessity is to change the second quote you can also match the whole string and put the pattern before the second quote into a capture group. Then making the substitution by the first match group + the substitution string would archive the issue.

For example, this regex will match everything before the second quote and put it into a group

(\"[^"]*)\"

if you replace whole the match (which includes the second quote) by only the value of the capture group (which does not include the second quote), then you would just cut it off.

See the online example

import re
p = re.compile(ur'(\"[^"]*)\"')
test_str = u"\"test1\"test2\"test3\""
subst = r"\1"

result = re.sub(p, subst, test_str)
print result #result -> "test1test2"test3

Upvotes: 1

Corley Brigman

Reputation: 12411

a parser is probably better, but depending on what you want to get out of it, there are other ways. if you need the data between the quotes:

 import re
 re.findall(r'".*?"', '"aaaa""aaaaa"aaaa""')
 ['"aaaa"',
 '"aaaaa"',
 '""']

if you need the indices, you could do it as a generator or other equivalent like this:

 def count_quotes(mystr):
     count = 0
     for i, x in enumerate(mystr):
         if x == '"':
              count += 1
              if count % 2 == 0:
                  yield i

list(count_quotes('"aaaa""aaaaa"aaaa""'))
[5, 12, 18]

Upvotes: 0

jonrsharpe

Reputation: 122169

You don't really need regex for this. You can do:

[i for i, c in enumerate(s) if c == '"'][1::2]

To get the index of every other '"'. Example usage:

>>> for s in ['"aaaaa"', '"aaaa""aaaaa"aaaa""', 'aaa"aaa', '"aaa"aaa"']:
    print(s, [i for i, c in enumerate(s) if c == '"'][1::2])


"aaaaa" [6]
"aaaa""aaaaa"aaaa"" [5, 12, 18]
aaa"aaa []
"aaa"aaa" [4]

Upvotes: 2

zmo

Reputation: 24802

Please read my answer about why you don't want to use regular expressions for such a problem, even though you can do that kind of non-regular job with it.

Ok then you probably want one of the solutions I give in the linked answer, where you'll want to use a recursive regex to match all the matching pairs.

Edit: the following has been written before the update to the question, which was asking only for second double quotes.

Though if you want to find only second double quotes in a string, you do not need regexps:

>>> s1='aoeu"aoeu'
>>> s2='aoeu"aoeu"aoeu'
>>> s3='aoeu"aoeu"aoeu"aoeu'
>>> def find_second_quote(s):
...     pos_quote_1 = s2.find('"')
...     if pos_quote_1 == -1:
...         return -1
...     pos_quote_2 = s[pos_quote_1+1:].find('"')
...     if pos_quote_2 == -1:
...         return -1
...     return pos_quote_1+1+pos_quote_2
... 
>>> find_second_quote(s1)
-1
>>> find_second_quote(s2)
4
>>> find_second_quote(s3)
4
>>>

here it either returns -1 if there's no second quote, or the position of the second quote if there is one.

Upvotes: 0

Hugh Bothwell

Reputation: 56714

import re
reg = re.compile(r'(?:\").*?(\")')

then

for match in reg.findall('"this is", "my test"'):
    print(match)

gives

"
"

Upvotes: 1

Python regex to find only second quotes of paired quotes

Answers (5)

Related Questions