Reputation: 251
Given the following inputs:
line1 = "Hey | Hello | Good | Morning"
line2 = "Hey , Hello , Good , Morning"
file1=length1=name1=title1=nil
Using ',' to split the string as follows:
file1, length1, name1, title1 = line2.split(/,\s*/)
I get the following output:
puts file1,length1,name1,title1
>Hey
>Hello
>Good
>Morning
However, using '|' to split the string I receive a different output:
file1, length1, name1, title1 = line2.split(/|\s*/)
puts file1,length1,name1,title1
>H
>e
>y
Both the strings are same except the separating symbol (a comma in first case and a pipe in second case). The format of the split function I am using is also the same except, of course, for the delimiting character. What causes this variation?
Upvotes: 4
Views: 114
Reputation: 56809
The problem is because |
has the meaning of OR in regex. If you want literal character, then you need to escape it \|
. So the correct regex should be /\|\s*/
Currently, the regex /|\s*/
means empty string or series of whitespace character. Since the empty string specified first in the OR, the regex engine will break the string up at every character (you can imagine that there is an empty string between characters). If you swap it to /\s*|/
, then the whitespaces will be preferred over empty string where possible and there will be no white spaces in the list of tokens after splitting.
Upvotes: 7