Reputation:
I'm storing some html-encoded data in a sql server database and I've written a script to output the data in a csv format minus the html tags and I'm getting a weird issue when html-decoding the remaining data. For example the data contains a quote character (which is html-encoded as ’
), but when I try to html-decode it the data comes out as a series of weird characters (’). Does anyone know how to solve this issue? The output encoding of the page is UTF-8 if that helps.
Any advice would be much appreciated!
Cheers
Tim
Upvotes: 0
Views: 3772
Reputation: 14763
Those 3 weird characters are how UTF-8 encodes the HTML entity ’
. (They're actually the octets 0xE2 0x80 0x99
, and those bytes render as "’" in your computer's default charset windows-1252
.) So I don't think you've got an issue with your encoding.
It's evidently a known problem that Excel 2000 has problems with .csv files in UTF-8 encoding. The solution, bizarrely enough, is to switch the filename extension to .txt, at which point Excel 2000 will evidently import the file correctly.
Upvotes: 3
Reputation: 109
If the data is read from the CSV files, open the csv file in notepad press Save As in the fiile menu, save the file as Encoding-UTF8.
Upvotes: 0