Prabesh Shrestha
Prabesh Shrestha

Reputation: 2742

split word in Ruby for counting

When I split a string "hello world /n" with

"hello world \n".scan(/\w+/)

I get ["hello", "world"]

I would like to count \n or \t as string as well .

Upvotes: 5

Views: 3919

Answers (6)

Konrad Reiche
Konrad Reiche

Reputation: 29493

Do not use \w+ for counting words. It would separate numbers and words with Unicode like so:

"The floating point number is 13.5812".scan /\w+/
=> ["The", "floating", "point", "number", "is", "13", "5812"]

The same is true for numbers with other delimiters like "12,000".

In Ruby 1.8 the expression w+ worked with Unicode, this has changed. If there are Unicode characters in your string, the word will be separated, too.

"Die Apfelbäume".scan /\w+/
=> ["Die", "Apfelb", "ume"]

There are two options here.

  1. You want to skip numbers altogether. Fine, just use

    /\p{Letter}+/
    
  2. You don't want to skip numbers, because you want to count them as words, too. Then use

    /\S+/
    

    The expression \S+ will match on non-whitespace characters /[^ \t\r\n\f]/. The only disadvantage is, that your words will have other characters attached to them. Like brackets, hyphens, dots, etc. For the sole purpose of counting this should not be a problem.

    If you want to have the words, too. Then you would need to apply additional character stripping.

Upvotes: 4

Alex
Alex

Reputation: 398

This is better if you don't want to split up words with apostrophes (isn't, 90's, etc)

"hello world \n".split(/[^\w']+/)

Upvotes: 1

kyanny
kyanny

Reputation: 1241

You can use named character class [:cntrl:].

irb(main):001:0> "hello world \n".scan(/\w+|[[:cntrl:]]/)
=> ["hello", "world", "\n"]

Upvotes: 0

EdvardM
EdvardM

Reputation: 3072

"hello world \n".scan /[\w\n\t]+/

Upvotes: 2

Yossi
Yossi

Reputation: 12100

In strings \n has a special meaning: it evolves to caret return which counts as whitespace. You should escape the backslash: \\n.

If you want to split your string by spaces only, you should use

"Hello world \n".split(/ /)

Upvotes: 3

Dutow
Dutow

Reputation: 5668

Do you want something like this?

"hello world \n".scan(/\w+|\n/)

Upvotes: 5

Related Questions