davideghz
davideghz

Reputation: 3685

ruby regex extract word between single quotes

I'm looking for a regex to match:

ciao: c'iao 'ciao'

with:

ciao #every word excluding non-word character
c'iao #including apostrophes
ciao #excluding the quotes ''

So far I've been able to match the first 2 requirements with:

/[\w']+/

but I'm struggling with extracting word between single quotes (w/o including the quotes). Note that I won't have a case where a word with apostrophe is included between quotes (like 'c'iao')

I've seen many similar Q&A but couldn't find any suiting my needs; Extra points for an answer that includes a brief explanation :)

Upvotes: 3

Views: 1717

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110685

Considering that words can begin or end with an apostrophe, or contain multiple apostrophes, I suggest first splitting on whitespace then removing pairs of single quotes that enclose words.

str = "'Twas because Bo didn't like Bess' or y'all's 'attitude'"

str.split.map { |s| s =~ /\A'.+'\z/ ? s[1..-2] : s }
  #=> ["'Twas", "because", "Bo", "didn't", "like", "Bess'", "or", "y'all's", "attitude"]

The first step produces

arr = str.split
  #=> ["'Twas", "because", "Bo", "didn't", "like", "Bess'", "or", "y'all's", "'attitude'"]

The regex matches elements of arr that begin and end with a single quote.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626851

You can use the following expression:

/\w+(?:'\w+)*/

See the Rubular demo

The expression matches:

  • \w+ - 1 or more word chars
  • (?:'\w+)* - zero or more sequences (as (?:...)* is a non-capturing group that groups a sequence of subpatterns quantified with * quantifier matching 0 or more occurrences) of:
    • ' - apostrophe
    • \w+ - 1 or more word chars.

See a short Ruby demo here:

"ciao: c'iao 'ciao'".scan(/\w+(?:'\w+)*/)
# => [ciao, c'iao, ciao]

Upvotes: 5

Related Questions