jamesbar2
jamesbar2

Reputation: 614

Having trouble with UTF-8 storing in NVarChar in SQL Server 2008

I'm pulling data using System.Net.WebClient from a web site, and when the data comes back everything parses and looks good except letters with accents. For example, when it returns an é, SQL Server 2008 saves it as é.

Just need to figure out how to convert these UTF-8 characters into something SQL Server can read. I'm storing it in an NVARCHAR(MAX) datatype.

I'm using Linq-to-SQL to insert into the database if you were curious.

Any thoughts on what I could do to convert it to the proper format?

Upvotes: 0

Views: 10956

Answers (2)

jamesbar2
jamesbar2

Reputation: 614

Figured it out! When using the WebClient class, I was downloading the data as a string.

My Original Configuration...

System.Net.WebClient wc = new WebClient();
string htmlData = wc.DownloadString(myUri);

I tried to convert this data into a UTF-16...from it's current string, but since Microsoft operates in UTF-16, it had handled the conversion on its own.

Instead, I switched my approach to reading the actual byte[] array from the data like so...

System.Net.WebClient wc = new WebClient();
string htmlData = UTFConvert(wc.DownloadData(myUri));

private string UTFConvert(byte[] utfBytes)
{
    byte[] isoBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utfBytes);
    return Encoding.Unicode.GetString(isoBytes);
}

This fixed the problem, and SQL correctly see's the accents in everything now. Yippee.

Cheers all, and thanks for your help!

Upvotes: 4

Remus Rusanu
Remus Rusanu

Reputation: 294487

Description of storing UTF-8 data in SQL Server. There is also a discussion of this topic at International Features in Microsoft SQL Server 2005. the gist of it is: SQL Server has no support for UTF-8. Feel free to upvote the request to Add support for storing UTF-8 natively in SQL Server.

As a note though, since you store Unicode string via LINQ, this would point that the problem occurs before writing into SQL Server. Namely your web pulling, does it appropriately convert the data read using an UTF-8 reader? Namely, do you read the WebResponse.GetResponseStream() via a StreamReader constructed with the appropriate UTF8Encoding? That should create the proper Unicode string and then the NVARCHAR storage in the DB (which is UCS-2) should be fine.

Upvotes: 3

Related Questions