Reputation: 3025
I am retrieving a large hash of results from a database query and writing them to a csv file. The code block below takes the results and creates the CSV. With the quote_char:
option it will replace the quotes with NULL characters which I need to properly create the tab-delimited file.
However, the NULL characters are getting converted into "" when they are loaded into their destination so I would like to remove those. If I leave out quote_char:
every field is double quoted which causes the same result.
How can I remove the NULL characters?
begin
CSV.open("#{file_path}"'file.tab', "wb", Options = {col_sep: "\t", quote_char: "\0"}) do |csv|
csv << ["Key","channel"]
series_1_results.each_hash do |series_1|
csv << ["#{series_1['key']}","#{series_1['channel']}"]
end
end
end
Upvotes: 9
Views: 10096
Reputation: 14408
From the Ruby CSV docs, setting force_quotes: false
in the options seems to work.
CSV.open("#{file_path}"'file.tab', "wb", { col_sep: "\t", force_quotes: false }) do |csv|
The above does the trick. I'd suggest against setting quote_char
to \0
since that doesn't work as expected.
There is one thing to note though. If the field is an empty string ""
- it will force the quote_char
to be printed into the CSV. But strangely a nil
value does not. I'd suggest that if at all you're expecting empty strings in the data, you somehow convert them into nil
when writing to the CSV (maybe using the ActiveSupport presence
method or anything similar).
Upvotes: 3
Reputation: 1449
As it is stated in the csv documentation you have to the set quote_char
to some character, and this character will always be used to quote empty fields.
It seems the only solution in this case is to remove used quote_chars
from the created csv file. You can do it like this:
quotedFile = File.read("#{file_path}"'file.tab')
unquotedFile = quotedFile.gsub("\0", "")
File.open("#{file_path}"'unquoted_file.tab',"w") { |file| file.puts replace }
I assume here that NULL's are the only escaped fields. If that's not the case use default quote_char: '"'
and gsub(',"",', '')
which should handle almost all possible cases of fields containing special characters.
But as you note that the results of your query are large it might be more practical to prepare the csv file on your own and avoid processing the outputs twice. You could simply write:
File.open("#{file_path}"'unquoted_file.tab',"w") do |file|
csv.puts ["Key","channel"]
series_1_results.each_hash do |series_1|
csv.puts ["#{series_1['key']},#{series_1['channel']}"]
end
end
Once more, you might need to handle fields with special characters.
Upvotes: 4
Reputation: 7399
If your input contains any data that needs to be escaped (such as the column separator or the quote character), then you do need to quote your data. Otherwise it cannot be parsed correctly later.
CSV.open('test.csv', 'wb', col_sep: "\t") do |csv|
csv << ["test", "'test'", '"test"', nil, "test\ttest"]
end
puts open('test.csv').read
#test 'test' """test""" "test test"
The CSV class won't quote anything unnecessarily (as you can see above). So I'm not sure why you're saying all your fields are being quoted. It could be somehow force_quotes
is getting set to true somewhere.
If you're absolutely certain your data will never contain \t
or "
, then the default quote_char
("
) should work just fine. Otherwise, if you want to avoid quoting anything, you'll need to pick another quote character that you're absolutely certain won't occur in your data.
CSV.open('test.csv', 'wb', col_sep: "\t", quote_char: "|") do |csv|
csv << ["test", "'test'", nil, '"test"']
end
puts open('test.csv').read
#test 'test' "test"
Upvotes: 1
Reputation: 160571
First, a tab-separated file is "TSV", vs. a comma-separated file which is "CSV".
Wrapping quotes around fields is necessary any time there could be an occurrence of the field delimiter inside a field.
For instance, how are you going to embed this string in a tab-delimited file?
Foo\tbar
The \t
is the representation of an embedded Tab.
The same problem occurs when writing a CSV file with a field containing commas. The field has to be wrapped in double-quotes to delimit the field itself.
Upvotes: 1