Reputation: 13403
I'm splitting a search result string so I can use Rails Highlight to highlight the terms. In some cases, there will be exact matches and single words in the same search term and I'm trying to write regex that will do that in a single pass.
search_term = 'pizza cheese "ham and pineapple" pepperoni'
search_term.split(/\W+/)
=> ["pizza", "cheese", "ham", "and", "pineapple", "pepperoni"]
search_term.split(/(?=\")\W+/)
=> ["pizza cheese ", "ham and pineapple", "pepperoni"]
I can get ham and pineapple
on its own (without the unwanted quotes), and I can easily split all the words, but is there some regex that will return an array like:
search_term.split(🤷♂️)
=> ["pizza", "cheese", "ham and pineapple", "pepperoni"]
Upvotes: 1
Views: 1244
Reputation: 110755
r = /
(?<=\") # match a double quote in a positive lookbehind
(?!\s) # next char cannot be a whitespace, negative lookahead
[^"]+ # match one or more characters other than double-quote
(?<!\s) # previous char cannot be a whitespace, negative lookbehind
(?=\") # match a double quote in a positive lookahead
| # or
\w+ # match one or more word characters
/x # free-spacing regex definition mode
str = 'pizza "ham and pineapple" mushroom pepperoni "sausage and anchovies"'
str.scan r
#=> ["pizza", "ham and pineapple", "mushroom", "pepperoni",
# "sausage and anchovies"]
Upvotes: 1
Reputation: 2981
Yes:
/"[^"]*?"|\w+/
https://regex101.com/r/fzHI4g/2
Not done as a split. Just take stuff in quotes, or single words...each one is a match.
£ cat pizza
pizza "a and b" pie
£ ruby -ne 'print $_.scan(/"[^"]*?"|\w+/)' pizza
["pizza", "\"a and b\"", "pie"]
£
so...search_term.scan(/regex/)
seems to return the array you want.
To exclude the quotes you need: This puts the quotes in lookarounds which assert that the matched expression has a quote before it (lookbehind), and a quote after it (lookahead) rather than containing the quotes.
/(?<=")\w[^"]*?(?=")|\w+/
Note that because the last regex doesn't consume the quotes, it uses whitespace to determine beginning vs. ending quotes so " a bear"
is not ok. This can be solved with capture groups, but if this is an issue, like I said in the comments, I would recommend just trimming quotes off each array element and using the regex at the top of the answer.
Upvotes: 4