Shpigford
Shpigford

Reputation: 25378

Split on different newlines

Right now I'm doing a split on a string and assuming that the newline from the user is \r\n like so:

string.split(/\r\n/)

What I'd like to do is split on either \r\n or just \n.

So how what would the regex be to split on either of those?

Upvotes: 55

Views: 31067

Answers (8)

Clark
Clark

Reputation: 3013

\n is for unix 
\r is for mac 
\r\n is for windows format

To be safe for operating systems. I would do /\r?\n|\r\n?/

"1\r2\n3\r\n4\n\n5\r\r6\r\n\r\n7".split(/\r?\n|\r\n?/)
=> ["1", "2", "3", "4", "", "5", "", "6", "", "7"]

Upvotes: 6

Matt Sanders
Matt Sanders

Reputation: 10865

Another option is to use String#chomp, which also handles newlines intelligently by itself.

You can accomplish what you are after with something like:

lines = string.lines.map(&:chomp)

Or if you are dealing with something large enough that memory use is a concern:

<string|io>.each_line do |line|
  line.chomp!
  #  do work..
end

Performance isn't always the most important thing when solving this kind of problem, but it is worth noting the chomp solution is also a bit faster than using a regex.

On my machine (i7, ruby 2.1.9):

Warming up --------------------------------------
           map/chomp    14.715k i/100ms
  split custom regex    12.383k i/100ms
Calculating -------------------------------------
           map/chomp    158.590k (± 4.4%) i/s -    794.610k in   5.020908s
  split custom regex    128.722k (± 5.1%) i/s -    643.916k in   5.016150s

Upvotes: 0

23inhouse
23inhouse

Reputation: 1927

Ruby has the methods String#each_line and String#lines

returns an enum: http://www.ruby-doc.org/core-1.9.3/String.html#method-i-each_line

returns an array: http://www.ruby-doc.org/core-2.1.2/String.html#method-i-lines

I didn't test it against your scenario but I bet it will work better than manually choosing the newline chars.

Upvotes: 18

Andrew Grimm
Andrew Grimm

Reputation: 81691

Are you reading from a file, or from standard in?

If you're reading from a file, and the file is in text mode, rather than binary mode, or you're reading from standard in, you won't have to deal with \r\n - it'll just look like \n.

C:\Documents and Settings\username>irb
irb(main):001:0> gets
foo
=> "foo\n"

Upvotes: 2

J&#246;rg W Mittag
J&#246;rg W Mittag

Reputation: 369633

The alternation operator in Ruby Regexp is the same as in standard regular expressions: |

So, the obvious solution would be

/\r\n|\n/

which is the same as

/\r?\n/

i.e. an optional \r followed by a mandatory \n.

Upvotes: 3

SjoerdRavn
SjoerdRavn

Reputation: 11

Perhaps do a split on only '\n' and remove the '\r' if it exists?

Upvotes: 1

Phrogz
Phrogz

Reputation: 303559

# Split on \r\n or just \n
string.split( /\r?\n/ )

Although it doesn't help with this question (where you do need a regex), note that String#split does not require a regex argument. Your original code could also have been string.split( "\r\n" ).

Upvotes: 15

NickAldwin
NickAldwin

Reputation: 11754

Did you try /\r?\n/ ? The ? makes the \r optional.

Example usage: http://rubular.com/r/1ZuihD0YfF

Upvotes: 79

Related Questions