Reputation: 3691
As part of an integration task I am doing, I have a column in my database that has the type nvarchar(30) and I need call a WCF service and pass through the values in this column (among others) that will then be stored in a column that has the type varchar(30) in the other database.
What should I do (presumably in the code that calls the WCF service) to convert my strings into "varchar friendly" strings?
Update: Nothing has gone wrong yet. However I will be doing an initial migration of 120,000 records through this service and then about 300 new records will be pushed through this service each day. As such, manual intervention of any kind is very undesirable and I'm just thinking about what might go wrong in advance. I have no control over the target database (with VARCHAR column), however I do know that it is SQL Server and C# for the application (not sure if they're using ADO.NET).
Upvotes: 2
Views: 7843
Reputation: 37957
I often have a similar problem, and wanted to at least be able to preview what damage (if any) moving from NVarChar to VarChar was going to do to my data.
First, VarChar is more complicated than a single type. It has multiple subsets, which Sql Server calls Collations, and in UTF-world are roughly, Code Pages.* The default Collation for VarChar on my en-us machines is: SQL_Latin1_General_CP1_CI_AS.
This question's answers explain plenty about that: What does 'COLLATE SQL_Latin1_General_CP1_CI_AS' do?
The important takeaways are that it's standard ASCII 0-127, then the remaining bits in ASCII Extended have to be in a specific Code Page - in this case, Code Page 1252, which also happens to be the Visual Studio en-us default.
There's also an interesting detail that this default exists for backwards compatibility but, is harmful to performance and should be avoided (whoops, never knew, it's everywhere in my databases).
Now that we've covered too much about that, you should be able to look at your columns and determine your Code Page. I don't need to do any work in code to do that - all VarChar is the default (harmful) one above. So checking if a value in a column is going to lose info when converted from NVarChar to VarChar, in C#, on my system looks like this:
var enc = System.Text.Encoding.GetEncoding(1252);
string converted = enc.GetString(enc.GetBytes(s));
if (s != converted)
...react to the change/loss in info in the string
based on this: Converting UTF8 to Windows-1252
That's all you need for your work, but for the curious in my work I tested the above against real data and found 40 differing strings in 1000, so, converting this column to VarChar would lose info (maybe important) in 4% of rows I happened to sample. In my case, most of that change was Emojis to those blank blocks or question marks. Occasionally that emoji info was important to determining what the data meant. I decided to retain NVarChar and use the above check to see if data is going to be lost when the data heads to another system I know is VarChar. I might explore finding a library or something to convert common Emojis in that scenario like 🤷♂️ -> (shrug emoji).
*Collations are more complex than just what Code Page is in use - they mostly are about how columns are compared. That detail is outside the scope of this question, so, don't actually,,, me bro.
Upvotes: 0
Reputation: 96552
Your problem isn't the datatypes, it's the data. All possible values that can be stored as nvarchar do not have an equivalent in varchar (depending on the collation). If the second database is under your control, the most trouble free way to pass the data is to change the varchar field to nvarchar now.
If you can't do that, then you need to check the data before you send it to ensure it is data that can convert. Depending on the kinds of data issues you have you may want to strip out characters that won't convert or replace them with some other character or put the record into some type of holding table rather than sending it to have it manaully fixed. Ahmed's suggestion looks like a possibility, I haven't tried it so I don't know if it will work. You may be lucky and not have any data problems (not every database that uses nvarchar is using any characters that varchar with the proper country collation won't have), but you have to plan for the problem.
Upvotes: 1
Reputation: 8339
There is nothing you have to do : all strings in .NET are UTF-16 encoded :
NVARCHAR
column (using ADO .NET I presume), you automatically get a .NET string (UTF-16, any necessary conversion is automatic).VARCHAR
column, any necessary conversion from UTF-16 is automatically performed as well.See here for more information on data type mappings for Sql Server in ADO .NET.
You just have to make sure that all your strings can be converted flawlessly from the source character set to the destination character set.
Upvotes: 8
Reputation: 7238
If the destination column represents ascii characters so simply use Encoding.Convert method
example:
Encoding.ASCII.GetString(Encoding.Convert(Encoding.Unicode, Encoding.ASCII, bytes))
Upvotes: 1