Cát Tường Vy
Cát Tường Vy

Reputation: 398

Inserting XML file with encoding SHIFT-JIS error

I have an XML file with SHIFT-JIS encoding as below:

<?xml version="1.0" encoding="SHIFT-JIS" standalone="yes"?>
<海外管理ファイル><PO番号>GV05097</PO番号><データベース><PO><Tbl_PO_H PO番号="GV05097"><DATA><PO番号 TYPE="200" LENGTH="13">GV05097</PO番号></DATA></Tbl_PO_H></PO></データベース></海外管理ファイル>

And I use SQL store procedure to insert it into SQL table:

alter PROCEDURE [dbo].[proc_TBL_PO_H_LoadXMLPO]
@xml  XML
AS  
    BEGIN
        SET NOCOUNT ON;
        INSERT INTO [ENVIETNAMPO].[dbo].[TBL_PO_H]
              SELECT
                  TBL_PO_H.value('(PO番号/text())[1]','varchar(13)') AS PO番号, --TAG
                  TBL_PO_H.value('(PO発行日/text())[1]','varchar(10)') AS PO発行日                  
              FROM
                 @xml.nodes('/海外管理ファイル/データベース/PO/Tbl_PO_H/DATA')AS TEMPTABLE(TBL_PO_H)      
    END

C# code for Load XML button:

string pathUser = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
            string pathDesktop = Path.Combine(pathUser, "Desktop");
            var xmlfilename = string.Empty;
            var xmlfilePath = string.Empty;
            //var dt = new DataTable();           
            var sqlConn = new SqlConnection(strConStr);
            try
            {
                    openFileDialog1.InitialDirectory = @pathDesktop;
                    openFileDialog1.Title = "Browse XML PO File";
                    openFileDialog1.Filter = "XML files (*.xml)|*.xml|All files (*.*)|*.*";
                    openFileDialog1.CheckFileExists = true;
                    openFileDialog1.CheckPathExists = true;
                    openFileDialog1.ShowHelp = true;
                    openFileDialog1.FileName = "*.xml";
                if (openFileDialog1.ShowDialog() == DialogResult.OK)
                {
                    xmlfilename =openFileDialog1.SafeFileName;
                    xmlfilePath = pathDesktop +"\\"+ xmlfilename;
                    string xml = File.ReadAllText(xmlfilePath, Encoding.GetEncoding("SHIFT-JIS"));
                    sqlConn.Open();

                    var cmd = new SqlCommand("proc_TBL_PO_H_LoadXMLPO", sqlConn);
                    cmd.CommandType = CommandType.StoredProcedure;
                    cmd.Parameters.AddWithValue("@xml", xml);
                    SqlCommand arithabortCommand = new SqlCommand("SET ARITHABORT ON", sqlConn);
                    arithabortCommand.ExecuteNonQuery();  
                    cmd.ExecuteNonQuery(); 
                    sqlConn.Close();
                }

                    MessageBox.Show("PO XML File has been imported successfully.", "Information",
                    MessageBoxButtons.OK,
                    MessageBoxIcon.Information);
            }
            catch (Exception ex)
            {

                MessageBox.Show(ex.ToString());
            }

But when loading, an error occur as the following, I tried to read XML file with correct encoding as the XML file. Please help me. Thank you!

System.Data.SqlClient.SqlException 0x80131904: parsing XML Line 1, character 59. Cannot swicth I- code ....

Upvotes: 0

Views: 1516

Answers (1)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67291

If I get this correctly you are passing the XML as is from C# into a SQL-Server stored procedure. You do not state the actual RDBMS, but I assume this is SQL-Server (due to the error message about the "switch" of an encoding). Hope this is correct, if not, this might help with other databases too...

Some things to know:

  • Within C# XML is either the string you see, or an hierarchically organized document like XmlDocument.
  • In any case, the XML is transposed into its string representation when you pass it over to the database (serialisation)
  • All strings in C# are unicode. You can define a special encoding and shift an encoded string into a byte array, but the string type itself is unicode in any case.
  • SQL Server will take the string and parse it into the native XML data type, which is a hierarchy table internally.
  • All parts of the XML (tag names, content, ...) are stored in SQL-Servers NVARCHAR, which is kind of unicode (UCS-2 actually).
  • SQL Server will - in any case - not allow you to store this declaration together with the XML. It will be omitted in any case...

So what is going on here:

You hand over a string, which is unicode actually, but the string tells the engine: No, I'm SHIFT-JIS!. This string is a liar :-D

This declaration is only needed if you store this XML in any byte container (like a file) and you want to tell a reader how to decode the content.

But between C# and the database there's no need to mess around: The string is plain unicode and will be taken into (almost) plain unicode.

Easy solution:

Pass the XML without the <?xml blah?> declaration.

UPDATE

About your question "how to strip of the declaration"?

At the moment when you've got the XML

string xml = File.ReadAllText(xmlfilePath, Encoding.GetEncoding("SHIFT-JIS"));

You do not have an XML (native type), but you've got a (unicode) string which looks like an XML.

You can use any string method here:

  • Use .IndexOf() to find ?> (end of declaration) and use .Substring() to cut away the declaration entirely
  • Use .Replace() to change your encoding to encoding="utf-16"
  • Use RegEx, whatever you like...

On the other side you might pass the string to the stored procedure as NVARCHAR(MAX) (the SP's parameter) and do the cut-off there before SQL-Server tries to take this as XML. But I'd suggest to solve this on C# side.

Upvotes: 1

Related Questions