Reputation: 2742
When I split a string "hello world /n" with
"hello world \n".scan(/\w+/)
I get ["hello", "world"]
I would like to count \n or \t as string as well .
Upvotes: 5
Views: 3919
Reputation: 29493
Do not use \w+
for counting words. It would separate numbers and words with Unicode like so:
"The floating point number is 13.5812".scan /\w+/
=> ["The", "floating", "point", "number", "is", "13", "5812"]
The same is true for numbers with other delimiters like "12,000"
.
In Ruby 1.8 the expression w+
worked with Unicode, this has changed. If there are Unicode characters in your string, the word will be separated, too.
"Die Apfelbäume".scan /\w+/
=> ["Die", "Apfelb", "ume"]
There are two options here.
You want to skip numbers altogether. Fine, just use
/\p{Letter}+/
You don't want to skip numbers, because you want to count them as words, too. Then use
/\S+/
The expression \S+
will match on non-whitespace characters /[^ \t\r\n\f]/
. The only disadvantage is, that your words will have other characters attached to them. Like brackets, hyphens, dots, etc. For the sole purpose of counting this should not be a problem.
If you want to have the words, too. Then you would need to apply additional character stripping.
Upvotes: 4
Reputation: 398
This is better if you don't want to split up words with apostrophes (isn't, 90's, etc)
"hello world \n".split(/[^\w']+/)
Upvotes: 1
Reputation: 1241
You can use named character class [:cntrl:].
irb(main):001:0> "hello world \n".scan(/\w+|[[:cntrl:]]/)
=> ["hello", "world", "\n"]
Upvotes: 0
Reputation: 12100
In strings \n
has a special meaning: it evolves to caret return which counts as whitespace.
You should escape the backslash: \\n
.
If you want to split your string by spaces only, you should use
"Hello world \n".split(/ /)
Upvotes: 3