Reputation: 32972
We're trying to find a regular expression that allows us to split sentences into words.
Of course, the immediate answer is to use \w
, except that it doesn't split on _
which we need.
Then, we tried [a-zA-Z0-9]
(we'd like to allow for numbers inside words), the problem is that it splits on accents, which are fairly common in many langues...
So, ideally, what regexp should I use to split the following sentence in the following words :
"Je ne déguste pas d'asperges, car je n'aime pas ça"
info
["Je","ne","déguste","pas","d", "asperges", "car","je", "n","aime","pas", "ça"]
Upvotes: 3
Views: 1848
Reputation: 4509
STR = "Je ne déguste pas d'asperges, car je n'aime pas ça"
words = STR.split /[\s,']+/
for w in words
print w, "\n"
end
The output is:
Je
ne
déguste
pas
d
asperges
car
je
n
aime
pas
ça
Upvotes: 3