Reputation: 21105
I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶
. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF
, but Excel ignores that.
Is there any workaround?
P.S. Which tools may potentially behave like Excel does?
UPDATE
I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it clashes with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex, and I've accepted this answer; and the second one by Mark that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.
UPDATE 2
Five months later after the last edit, I've noticed that Alex's answer has disappeared for some reason. I really hope it wasn't a technical issue and I hope there is no more discussion on which answer is greater now. So I'm accepting Mark's answer as the best one.
Upvotes: 644
Views: 705470
Reputation: 46341
15 years later, I finally found the solution: write CSV in UTF-16-LE
without BOM
, and it will work on Excel 2007+.
Example with Python:
import csv
with open('test.csv', "w", encoding="utf-16-le", newline='') as csvfile:
w = csv.writer(csvfile, delimiter=';')
w.writerow(["abc", "def"])
w.writerow(["été", "hiver"])
The output CSV file can be opened directly with Excel, and:
Note: if using encoding="utf-16"
instead of utf-16-le
, (1) will still be ok, but not (2).
Upvotes: 2
Reputation: 1014
For those that can afford to convert a few files manually, there are some answers here mentioning on how to convert them with Notepad++, but I wanted to add the solution with VS Code (Visual Studio Code / VSCode):
Upvotes: 2
Reputation: 712
Drastic change I made: instead of providing .csv to my users I provide .xlsx :)
Since I was programmatically generating the CSV file and since only humans manipulate those files, there is no reason to keep using the simple format .csv (more appropriate to be parsed). Note .xlsx
files are widely supported (it's not only in Excel).
I did not have to change all the logic, I just took my previous CSV generation script, and added an Excel library to convert the CSV to XLSX in a few lines. You should be able to do so no matter the language :)
Ideally I would have preferred to go without Microsoft stuff, but sometimes you just sit on the "standard".
Upvotes: 0
Reputation: 22696
In Python, use encoding=utf-8-sig
which is Python's name for UTF-8 with BOM. Just utf-8
will not get picked up by Excel or other Microsoft software.
From https://docs.python.org/3/library/codecs.html:
To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python calls "utf-8-sig")
See also What is the difference between utf-8 and utf-8-sig?
Using the standard python csv lib, this would be:
with open('some.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
It also works with other libs such as pandas:
df.to_csv('some.csv', encoding='utf-8-sig')
Upvotes: 3
Reputation: 156459
The UTF-8 Byte-order mark (BOM) will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).
In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker in a GetBytes()
call. You need to use streams (or use a workaround) to get the BOM to output.
Upvotes: 233
Reputation: 3171
Just sharing a comprehensive function that might make your life easier working with CSV files.... please note last function argument in relation to this topic
function array2csv($data, $file = '', $download = true, $mode = 'w+', $delimiter = ',', $enclosure = '"', $escape_char = "\\", $addUnicodeBom = false)
{
$return = false;
if ($file == '') {
$f = fopen('php://memory', 'r+');
} else {
$f = fopen($file, $mode);
}
if ($addUnicodeBom) {
$utf8_with_bom = chr(239) . chr(187) . chr(191);
fwrite($f, $utf8_with_bom);
}
foreach ($data as $line => $item) {
fputcsv($f, $item, $delimiter, $enclosure, $escape_char);
}
rewind($f);
if ($download == true) {
$return = stream_get_contents($f);
} else {
$return = true;
}
return $return;
}
Upvotes: 0
Reputation: 4134
Found a solution for ASP.NET Core to download CSV's as UTF8 with POM:
byte[] csvBytes = Encoding.Default.GetBytes(csvString);
UTF8Encoding utf8 = new UTF8Encoding(true);
byte[] bom = utf8.GetPreamble();
var result = bom.Concat(csvBytes).ToArray();
return new FileContentResult(result, MediaTypeHeaderValue.Parse("text/csv; charset=utf-8"));
Excel is recognizes the downloaded CSV file than as UTF8.
Upvotes: 2
Reputation: 419
You can convert .csv file to UTF-8 with BOM via Notepad++:
Encoding
→Convert to UTF-8-BOM
.File
→Save
.Worked in Microsoft Excel 2013 (15.0.5093.1000) MSO (15.0.5101.1000) 64-bit from Microsoft Office Professional Plus 2013 on Windows 8.1 with locale for non-Unicode programs set to "German (Germany)".
Upvotes: 28
Reputation: 754
It's March 2022, and it seems we cannot use both a BOM and the sep=... line. Adding the sep=\t or similar, makes Excel ignore the BOM.
Using a semicolon seems to be a default Excel understands, in which case we can skip the sep=... line and it works.
This is Microsoft 365 with Excel version 2110 build 14527.20276.
Upvotes: 1
Reputation: 176
Working solution for office 365
UTF-16
(no LE, BE)\t
Code in PHP
$header = ['číslo', 'vytvořeno', 'ěščřžýáíé'];
$fileName = 'excel365.csv';
$fp = fopen($fileName, 'w');
fputcsv($fp, $header, "\t");
fclose($fp);
$handle = fopen($fileName, "r");
$contents = fread($handle, filesize($fileName));
$contents = iconv('UTF-8', 'UTF-16', $contents);
fclose($handle);
$handle = fopen($fileName, "w");
fwrite($handle, $contents);
fclose($handle);
Upvotes: 2
Reputation: 523
In php you just prepend $bom to your $csv_string:
$bom = sprintf( "%c%c%c", 239, 187, 191); // EF BB BF
file_put_contents( $file_name, $bom . $csv_string );
Tested with MS Excel 2016, php 7.2.4
Upvotes: 6
Reputation: 2127
I faced the same problem a few days ago, and could not find any solution because I cannot use the import from csv
feature because it makes everything to be styled as string.
My solution was to first open the file with notpad++ and change the encode to ASCII
.
Then just opened the file in excel and it worked as expected.
Upvotes: 2
Reputation: 4581
We have used this workaround:
Upvotes: 63
Reputation: 129
I tried everything I could find on this thread and similar, nothing worked fully. However, importing to google sheets and simply downloading as csv worked like a charm. Try it out if you come to my frustration point.
Upvotes: 0
Reputation: 8138
It is incredible that there are so many answers but none answers the question:
"When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."
The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.
So the question is how to make Excel show UTF8 data with a simple double click?
Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:
"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."
I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:
Content of UTF8 CSV file:
Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ
Result in Excel 2007:
A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.
Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.
However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.
Content of UTF8 XLS file:
<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>
Result in Excel 2007:
You can even use colors in HTML which Excel will show correctly.
<style>
.Head { background-color:gray; color:white; }
.Red { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>
Result in Excel 2007:
In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:
<html xmlns:x="urn:schemas-microsoft-com:office:excel">
<head>
<meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
<xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>MySuperSheet</x:Name>
<x:WorksheetOptions>
<x:DisplayGridlines/>
</x:WorksheetOptions>
</x:ExcelWorksheet>
</x:ExcelWorksheets>
</x:ExcelWorkbook>
</xml>
</head>
<body>
<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>
</body>
</html>
This code even allows to specify the name of the worksheet (here "MySuperSheet")
Result in Excel 2007:
Upvotes: 85
Reputation: 1261
hi i'm using ruby on rails for csv generation. In our application we plan to go for the multi language(I18n) and we faced an issue while viewing I18n content in the CSV file of windows excel.
Was fine with Linux (Ubuntu) and mac.
We identified that windows excel need to be imported the data again to view the actual data. While import we will get more options to choose character set.
But this can’t be educated for each and every user, so solution we looking for is to open just by double click.
Then we identified the way of showing data by open mode and bom in windows excel with the help of aghuddleston gist. Added at reference.
Example I18n content
In Mac and Linux
Swedish : Förnamn English : First name
In Windows
Swedish : Förnamn English : First name
def user_information_report(report_file_path, user_id)
user = User.find(user_id)
I18n.locale = user.current_lang
open_mode = "w+:UTF-16LE:UTF-8"
bom = "\xEF\xBB\xBF"
body user, open_mode, bom
end
def headers
headers = [
"ID", "SDN ID",
I18n.t('sys_first_name'), I18n.t('sys_last_name'), I18n.t('sys_dob'),
I18n.t('sys_gender'), I18n.t('sys_email'), I18n.t('sys_address'),
I18n.t('sys_city'), I18n.t('sys_state'), I18n.t('sys_zip'),
I18n.t('sys_phone_number')
]
end
def body tenant, open_mode, bom
File.open(report_file_path, open_mode) do |f|
csv_file = CSV.generate(col_sep: "\t") do |csv|
csv << headers
tenant.patients.find_each(batch_size: 10) do |patient|
csv << [
patient.id, patient.patientid,
patient.first_name, patient.last_name, "#{patient.dob}",
"#{translate_gender(patient.gender)}", patient.email, "#{patient.address_1.to_s} #{patient.address_2.to_s}",
"#{patient.city}", "#{patient.state}", "#{patient.zip}",
"#{patient.phone_number}"
]
end
end
f.write bom
f.write(csv_file)
end
end
Important things to note here is open mode and bom
open_mode = "w+:UTF-16LE:UTF-8"
bom = "\xEF\xBB\xBF"
Before writing the CSV insert BOM
f.write bom
f.write(csv_file)
Windows and Mac
File can be opened directly by double clicking.
Linux (ubuntu)
While opening a file ask for the separator options -> choose “TAB”
Upvotes: 2
Reputation: 8708
I am generating csv files from a simple C# application and had the same problem. My solution was to ensure the file is written with UTF8 encoding, like so:
// Use UTF8 encoding so that Excel is ok with accents and such.
using (StreamWriter writer = new StreamWriter(path, false, Encoding.UTF8))
{
SaveCSV(writer);
}
I originally had the following code, with which accents look fine in Notepad++ but were getting mangled in Excel:
using (StreamWriter writer = new StreamWriter(path))
{
SaveCSV(writer);
}
Your mileage may vary - I'm using .NET 4 and Excel from Office 365.
Upvotes: 0
Reputation: 3027
As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:
Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:
gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csv
This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.
Upvotes: 11
Reputation: 3020
Upvotes: 1
Reputation: 529
This is not accurately addressing the question but since i stumbled across this and the above solutions didn't work for me or had requirements i couldn't meet, here is another way to add the BOM when you have access to vim:
vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv
Upvotes: 3
Reputation: 474
A truly amazing list of answers, but since one pretty good one is still missing, I'll mention it here: open the csv file with google sheets and save it back to your local computer as an excel file.
In contrast to Microsoft, Google has managed to support UTF-8 csv files so it just works to open the file there. And the export to excel format also just works. So even though this may not be the preferred solution for all, it is pretty fail safe and the number of clicks is not as high as it may sound, especially when you're already logged into google anyway.
Upvotes: 5
Reputation: 290
Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:
sed -I '1s/^/\xef\xbb\xbf/' *.csv
, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:
Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)
Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.
Upvotes: 3
Reputation: 2149
If you want to make it fully automatic, one click, or to load automatically into Excel from say a web page, but can't generate proper Excel files, then I would suggest looking at SYLK format as an alternative. OK it is not as simple as CSV but it is text based and very easy to implement and it supports UTF-8 with no issues.
I wrote a PHP class that receives the data and outputs a SYLK file which will open directly in Excel by just clicking the file (or will auto-launch Excel if you write the file to a web page with the correct mime type. You can even add formatting (like bold, format numbers in particular ways etc) and change column sizes, or auto size columns to the text in the columns and all in all the code is probably not more than about 100 lines.
It is dead easy to reverse engineer SYLK by creating a simple spreadsheet and saving as SYLK and then reading it with a text editor. The first block are headers and standard number formats that you will recognise (which you just regurgitate in every file you create), then the data is simply an X/Y coordinate and a value.
Upvotes: 0
Reputation: 341
Had the same problems with PHP-generated CSV files.
Excel ignored the BOM when the Separator was defined via "sep=,\n"
at the beginning of the content (but of course after the BOM).
So adding a BOM ("\xEF\xBB\xBF"
) at the beginning of the content and setting the semicolon as separator via fputcsv($fh, $data_array, ";");
does the trick.
Upvotes: 27
Reputation: 1200
Just for help users interested on opening the file on Excel that achieve this thread like me.
I have used the wizard below and it worked fine for me, importing an UTF-8 file. Not transparent, but useful if you already have the file.
Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0
Upvotes: 3
Reputation: 359
Old question but heck, the simplest solution is:
Upvotes: 19
Reputation: 946
Yes it is possible. When writing the stream creating the csv, the first thing to do is this:
myStream.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)
Upvotes: 3
Reputation: 31
This is my working solution:
vbFILEOPEN = "your_utf8_file.csv"
Workbooks.OpenText Filename:=vbFILEOPEN, DataType:=xlDelimited, Semicolon:=True, Local:=True, Origin:=65001
The key is Origin:=65001
Upvotes: 3
Reputation: 279195
I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.
I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.
Upvotes: 14
Reputation: 1151
The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF
did help.
Upvotes: 97