Jake McAllister
Jake McAllister

Reputation: 1075

How to match accented characters in a regex?

I have this block of text defined as new_text bellow and i have a gsub block that runs through the text and should replace this bit

@[James Andrés Trento D.](content:25)

with

@James

However, with their being a é in the name its the \w isn't matching the word. I have tried using

[:alpha:]

without any luck. Does anyone know how I can get my regular expression to match accents?

new_text = "I have a video of @[James Andrés Trento D](content:25) dancing, but too big! May 5 - 9."

new_text.gsub! /@\[(?<name>[\w\s\-\']+)\]\(content:(?<userid>\d+)\)/ do
  m = $~
  name, id = m[:name], m[:userid]
  "@#{name.split(' ').first}"
end
puts new_text

Upvotes: 0

Views: 254

Answers (1)

Maxim
Maxim

Reputation: 9961

One of possible solutions is to accept all not ] symbols as part of name:

@\[(?<name>[^\]]+)\]\(content:(?<userid>\d+)\)
            ^^^ <- match all not `]` symbols

Upvotes: 1

Related Questions