user2012677
user2012677

Reputation: 5745

CSV::MalformedCSVError: New line must be <"\n\r">

Trying to parse this file with Ruby CSV.

https://www.sec.gov/files/data/broker-dealers/company-information-about-active-broker-dealers/bd070219.txt

However, I am getting an error.

CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => "\n\r" }).each do |row|
    puts row
end

CSV::MalformedCSVError: New line must be <"\n\r"> not <"\r"> in line 1.

Upvotes: 3

Views: 3100

Answers (2)

Schwern
Schwern

Reputation: 165218

Windows row_sep is "\r\n", not "\n\r". However this CSV is malformed. Looking at it using a hex editor it appears to be using "\r\r\n".

It's tab-delimited.

In addition it is not using proper quoting, line 247 has 600 "B" STREET STE. 2204, so you need to turn off quote characters.

quote_char: nil, col_sep: "\t", row_sep: "\r\r\n"

There's an extra tab on the end, each line ends with \t\r\r\n. You can also look at it as using a row_sep of "\r\n" with an extra \r field.

quote_char: nil, col_sep: "\t", row_sep: "\r\n"

Or you can view it as having a row_sep of \t\r\r\n and no extra field.

quote_char: nil, col_sep: "\t", row_sep: "\t\r\r\n"

Either way, it's a mess.


I used a hex editor to look at the file as text and raw data side by side. This let me see what's truly at the end of the line.

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                       
00000000: 3030 3030 3030 3139 3034 0941 4252 4148  0000001904.ABRAH
00000010: 414d 2053 4543 5552 4954 4945 5320 434f  AM SECURITIES CO
00000020: 5250 4f52 4154 494f 4e09 3030 3832 3934  RPORATION.008294
00000030: 3532 0933 3732 3420 3437 5448 2053 5452  52.3724 47TH STR
00000040: 4545 5420 4354 2e20 4e57 0920 0947 4947  EET CT. NW. .GIG
00000050: 2048 4152 424f 5209 5741 0939 3833 3335   HARBOR.WA.98335
00000060: 090d 0d0a 3030 3030 3030 3233 3033 0950  ....0000002303.P
          ^^^^^^^^^

Hex 09 0d 0d 0a is \t\r\r\n.

Alternatively, you can print the lines with p and any invisible characters will be revealed.

f = File.open(file_name)
p f.readline

"0000001904\tABRAHAM SECURITIES CORPORATION\t00829452\t3724 47TH STREET CT. NW\t \tGIG HARBOR\tWA\t98335\t\r\r\n"

Upvotes: 4

GProst
GProst

Reputation: 10237

Use :row_sep => :auto instead of :row_sep => "\n\r":

CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => :auto }).each do |row|
    puts row
end

Upvotes: 1

Related Questions