Reputation: 5745
Trying to parse this file with Ruby CSV.
However, I am getting an error.
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => "\n\r" }).each do |row|
puts row
end
CSV::MalformedCSVError: New line must be <"\n\r"> not <"\r"> in line 1.
Upvotes: 3
Views: 3100
Reputation: 165218
Windows row_sep is "\r\n"
, not "\n\r"
. However this CSV is malformed. Looking at it using a hex editor it appears to be using "\r\r\n"
.
It's tab-delimited.
In addition it is not using proper quoting, line 247 has 600 "B" STREET STE. 2204
, so you need to turn off quote characters.
quote_char: nil, col_sep: "\t", row_sep: "\r\r\n"
There's an extra tab on the end, each line ends with \t\r\r\n
. You can also look at it as using a row_sep of "\r\n"
with an extra \r
field.
quote_char: nil, col_sep: "\t", row_sep: "\r\n"
Or you can view it as having a row_sep of \t\r\r\n
and no extra field.
quote_char: nil, col_sep: "\t", row_sep: "\t\r\r\n"
Either way, it's a mess.
I used a hex editor to look at the file as text and raw data side by side. This let me see what's truly at the end of the line.
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000000: 3030 3030 3030 3139 3034 0941 4252 4148 0000001904.ABRAH
00000010: 414d 2053 4543 5552 4954 4945 5320 434f AM SECURITIES CO
00000020: 5250 4f52 4154 494f 4e09 3030 3832 3934 RPORATION.008294
00000030: 3532 0933 3732 3420 3437 5448 2053 5452 52.3724 47TH STR
00000040: 4545 5420 4354 2e20 4e57 0920 0947 4947 EET CT. NW. .GIG
00000050: 2048 4152 424f 5209 5741 0939 3833 3335 HARBOR.WA.98335
00000060: 090d 0d0a 3030 3030 3030 3233 3033 0950 ....0000002303.P
^^^^^^^^^
Hex 09 0d 0d 0a is \t\r\r\n
.
Alternatively, you can print the lines with p
and any invisible characters will be revealed.
f = File.open(file_name)
p f.readline
"0000001904\tABRAHAM SECURITIES CORPORATION\t00829452\t3724 47TH STREET CT. NW\t \tGIG HARBOR\tWA\t98335\t\r\r\n"
Upvotes: 4
Reputation: 10237
Use :row_sep => :auto
instead of :row_sep => "\n\r"
:
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => :auto }).each do |row|
puts row
end
Upvotes: 1