Jan
Jan

Reputation: 6828

How can I scan a ruby string with unicode characters?

I say

"#gefährlicher #blödsinn".scan(/#(\w+)/).flatten

irb responds with:

"#gef��hrlicher #bl��dsinn".scan(/#(\w+)/).flatten

and I get

=> ["gef", "bl"]

which is obviously not what I want.

What am I doing wrong here?

Upvotes: 1

Views: 1650

Answers (1)

Darshan Rivka Whittle
Darshan Rivka Whittle

Reputation: 34031

As per this answer, and the Regex doducmentation, \w is good only for [a-zA-Z0-9_]. You want \p{Word}.

"#gefährlicher #blödsinn".scan(/#(\p{Word}+)/).flatten
# => ["gefährlicher", "blödsinn"]

That said, I don't know what you mean by "irb responds with..." Obviously irb responds with the => part...

Upvotes: 4

Related Questions