shimmoril
shimmoril

Reputation: 682

Reading a CSV w/ CFFile & Non-Roman Characters

Update: The original CSV was created in Excel; when I copied the data in to a Google Spreadsheet and downloaded a CSV from Drive, it works fine. I'm guessing there's an encoding issue w/ the Excel CSV? Is there any way to work around this w/ Excel or do we need to tell our clients to use Google docs?

I've got a CSV w/ non-roman characters (my example is in French, but we support entirely non-roman languages such as Arabic and Thai as well) that I'm reading via ColdFusion's cffile. The problem is the output from the read is converting all the accented characters into a weird ? symbol (�). There was originally no charset specified on the cffile, so I tried adding utf-8 (no change) and utf-16 (everything is converted to sort-of Chinese?).

Anyone know how I can get this data out of the CSV without losing/messing up the characters?

CSV Example:

Smith,Joan,[email protected],Hôpital Jésus

Original cffile:

<cffile action="read" file="#expandedFilePath#" variable="strCSV">

cffile w/ charset added:

<cffile action="read" file="#expandedFilePath#" variable="strCSV" charset="utf-8">

cfdump of strCSV (no charset/utf-8 charset):

Smith,Joan,[email protected],H�pital J�sus

cfdump of strCSV (utf-16 charset):

卭楴栬䩯慮ⱪ潡渮獭楴桀瑥獴⹣潭ⱈ楴慬⁊畳ഊ

Upvotes: 4

Views: 954

Answers (1)

wiesion
wiesion

Reputation: 2445

Excel, like most Windows programs, uses the CP-1252 encoding (not UTF-8; and this is important: ALSO NOT ISO-8859-1 as recognised by most encoding guessers). Did you already try to do:

<cffile action="read" file="#expandedFilePath#" 
      variable="strCSV" 
      charset="windows-1252" />

If this works, can you rely on your inputs to always be default Windows files?

Upvotes: 1

Related Questions