Reputation: 2142
I am using the below regex to split a text at certain ending punctuation however it doesn't work with quotes.
text = "\"Hello my name is Kevin.\" How are you?"
text.scan(/\S.*?[...!!??]/)
=> ["\"Hello my name is Kevin.", "\" How are you?"]
My goal is to produce the following result, but I am not very good with regex expressions. Any help would be greatly appreciated.
=> ["\"Hello my name is Kevin.\"", "How are you?"]
Upvotes: 0
Views: 117
Reputation: 89649
text.scan(/"(?>[^"\\]+|\\{2}|\\.)*"|\S.*?[...!!??]/)
The idea is to check for quoted parts before. The subpattern is a bit more elaborated than a simple "[^"]*"
to deal with escaped quotes (* see at the end to a more efficient pattern).
pattern details:
" # literal: a double quote
(?> # open an atomic group: all that can be between quotes
[^"\\]+ # all that is not a quote or a backslash
| # OR
\\{2} # 2 backslashes (the idea is to skip even numbers of backslashes)
| # OR
\\. # an escaped character (in particular a double quote)
)* # repeat zero or more times the atomic group
" # literal double quote
| # OR
\S.*?[...!!??]
to deal with single quote to you can add: '(?>[^'\\]+|\\{2}|\\.)*'|
to the pattern (the most efficient), but if you want make it shorter you can write this:
text.scan(/(['"])(?>[^'"\\]+|\\{2}|\\.|(?!\1)["'])*\1|\S.*?[...!!??]/)
where \1
is a backreference to the first capturing group (the found quote) and (?!\1)
means not followed by the found quote.
(*) instead of writing "(?>[^"\\]+|\\{2}|\\.)*"
, you can use "[^"\\]*+(?:\\.[^"\\]*)*+"
that is more efficient.
Upvotes: 2
Reputation: 369494
Add optional quote (["']?
) to the pattern:
text.scan(/\S.*?[...!!??]["']?/)
# => ["\"Hello my name is Kevin.\"", "How are you?"]
Upvotes: 1