Baz R
Baz R

Reputation: 97

non ascii csv delimiters ignored in unix

I am trying to create a csv file from data retrieved from a database. The data itself contains commas, pipes and any number of delimiters. We have chosen to use the non ascii broken pipe symbol ¦ as the delimiter and this also has to be present in xml config files and java test files.

When our java files are deployed to unix, it complains that an invalid character is found, I guess because it's finding a non ascii character in ascii files.

So we converted the files to UTF-8, in windows this shows the ¦ character as a � character. So we copied the broken pipe symbol from a UTF-8 website. This now compiles in windows and unix fine. However, in windows the tests run fine but not in unix, as it's interpreting the ¦ as ¦

Can anyone advise how I should handle these files and what format they should be in?

The only other solution I can think of right now is using a combination of ascii characters as delimiters, which I don't really want to do.

Thanks in advance

Upvotes: 0

Views: 507

Answers (1)

Martin Serrano
Martin Serrano

Reputation: 3775

The general approach is to quote fields that may contain delimiters. Embedded quotes are then handled by using two quotes to represent them. This probably requires more pre/post processing than you are currently doing but will make your code more robust (what happens if the database field starts using the embedded pipe character?).

The opencsv project can handle this use case.

Upvotes: 1

Related Questions