Sean
Sean

Reputation: 1522

Why are my Chinese characters not displayed correctly in c# string

I am storing Chinese and English text in an SQL Server 2005 database and displaying it on a webpage, but the Chinese is not being displayed correctly. I have been reading about the subject and have done the following:

Chinese characters are being displayed in the page correctly when I insert them directly into the page i.e. don't get them from the database

These are the characters that should be displayed:全澳甲流确诊病例已破100

This is what is displayed when the text is retrieved from the database: 全澳甲æµç¡®è¯Šç—…ä¾‹å·²ç ´1001

This seems to be something that is related to how strings are handled in c# because the Chinese can get retrieved and displayed correctly in classic asp

Is there anything else I need to do to get the data out of the database, into a string and output correctly on an aspx page?

Upvotes: 4

Views: 28604

Answers (5)

russau
russau

Reputation: 9098

How are the characters getting into the database? Are you entering them via a stored proc? Make sure the parameters on your stored proc are also nvarchar AND on the parameters on the command object you are calling the proc from.

Update: the consensus on the thread is that the database doesn't have properly encoded NVARCHAR content. Here's my latest theory: the database has the UTF8 bytes. These bytes remain untouched when they are output from from ASP. ASP.NET takes the UTF8 bytes and interprets it as single-byte characters.

Try get the bytes out of the the database, and decode it as UTF8, eg:

SqlCommand command = new SqlCommand("SELECT zhtext FROM TestTable", connection);
byte[] byteArray = (byte[])command.ExecuteScalar();
lblText.Text = Encoding.UTF8.GetString(byteArray);

Upvotes: 1

devio
devio

Reputation: 37205

The summary for me looks like:

  • characters displayed correctly in ASP
  • characters displayed garbled in SSMS
  • characters displayed garbled in ASP.Net

conclusion: data in the database is not encoded correctly, and you need to migrate the data to unicode to deal with them in C#, just as Ryan sketched.

Upvotes: 0

yinyueyouge
yinyueyouge

Reputation: 3754

So far the information is:

  1. You are using direct SQL INSERT script to insert into the database.
  2. The data appears broken in database.

The problem might lie in two places:

  1. In your INSERT statement, did you prefix the insert value with N?

    INSERT INTO #tmp VALUES (N'全澳甲流确诊病例已破100')

  2. If you prefix the value with N, does the String object hold the correct data?

    String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

Here I assume value is a String object.

Does this String object hold the correct Chinese characters?

Try print out its value and see.

Updated:

Let's assume the INSERT query is constructed as below:

String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

I assume value holds the Chinese character.

Did you assign the Chinese characters into value directly? Like

String value = "全澳甲流确诊病例已破100";

The above code shall work. However, if you have done any intermediate processing, it will cause problem.

I did a localized TC project before; the previous architect had done several encoding conversions which are necessary in ASP; but they will create problem in .NET:

  String value = "全澳甲流确诊病例已破100";
  Encoding tc = Encoding.GetEncoding("BIG5");
  byte[] bytes = tc.GetBytes(value);
  value = Encoding.Unicode.GetString(bytes);

The above conversions are unnecessary. In .NET, simply direct assignment will work:

  String value = "全澳甲流确诊病例已破100";

That is because String constants and the String object itself are Unicode compliant.

The framework library, such as File IO, when reading a file which is not encoded in Unicode, they will convert the foreign encoding to Unicode; in other words, the framework will do this dirty job for you. You do not need to perform manual encoding conversion most of time.

Update: Understood that ASP is used to insert data into an SQL server.

I have written a small piece of ASP to insert some Chinese chars into SQL database and it works.

I have a database named "trans" and I created a table "temp" inside. The ASP page is encoded in UTF-8.

<html>
<head title="Untitled">
<meta http-equiv="content-type" content="text/html";charset="utf-8">
</head>
<body>
<script language="vbscript" runat="server">

If Request.Form("Button1") = "Submit" Then

    SqlQuery = "INSERT INTO trans..temp VALUES (N'" + Request.Form("Text1") + "')"

    Set cn = Server.CreateObject("ADODB.Connection")
    cn.Provider = "sqloledb"
    cn.Properties("Data Source").Value = *********
    cn.Properties("Initial Catalog").Value = "TRANS"
    cn.Properties("User ID").Value = "sa"
    cn.Properties("Password").Value = **********
    cn.Properties("Persist Security Info").Value = False

    cn.Open
    cn.Execute(SqlQuery)
    cn.Close

    Set cn = Nothing

    Response.Write SqlQuery
End If

</script>
<form name="form1" method="post" action="input.asp">
    <input name="Text1" type="text" />
    <input name="Button1" value="Submit" type="submit" />
</form>        
</body>
</html>

The table is defined as belows in my database:

 create table temp (data NVARCHAR(100))

Submit the ASP page several times and my table contains proper Chinese data:

select * from trans..temp

data
----------------
test
测试
全澳甲流确诊病例已破100

Hope this can help.

Upvotes: 6

Ali Shafai
Ali Shafai

Reputation: 5161

Have you installed the "support for eastern languages" in your windows? is it XP? if that's the case, your data might be all well, just the SQL management studio doesn't show it properly. (all true type fonts show OK even without the "support for chinese", but system fonts don't)

Upvotes: 0

Will Charczuk
Will Charczuk

Reputation: 919

This is definitely a problem with the encoding of the strings at some point in your round trip from the database to the c# string, but from the sounds of it you're doing everything correctly.

For our database we store Unicode data in NVARCHAR() columns and then read them out to normal C# strings; no text encoding changes were necessary. What kind of of data objects are you using (i.e DataSets, just a DataReader, LINQtoSQL)?

In our application we read the results of the stored procedure using FetchDataSet, and then do a DataBinder.Eval() to assign the string that is eventually the text of a label.

Upvotes: 0

Related Questions