Israel Lopez
Israel Lopez

Reputation: 1145

Encoding Strings over Raw Sockets - Extra Characters

I'm troubleshooting a commercial application that is having problems reading the XML i'm sending.

My application is Java, the commercial application is written in C# .NET 4.0 on Windows. The C# application is listening on a simple TCP socket for raw XML. I send data as bytes on the wire from a string (XML). Both the Java and C# code is running on the same host. Data is sent over localhost.

Every other message the C# application responds with an error indicating malformed XML. Both the commercial team and I are confounded as to why. In the debugger, and logs the XML i'm sending is valid. However, once it arrives on the C# side; a single character or two is added to the XML declaration.

What we found in the logs:

 Expected
 <?xml version="1.0" encoding="ISO-8859-1" ?>

 Observed
 <?xml version="1.0" encoding="ISO-8859-M1" ?>
 <?xml oversion="1.0" encoding="ISO-8859-1" ?>
 <?=xml version="1.0" encoding="ISO-8859-1" ?>

I'm sending to the C# application with something like in Java.

String request = "Whatever";
Socket clientSocket = new Socket(Host, Port);
DataOutputStream outToServer = new DataOutputStream(clientSocket.getOutputStream()) ;
outToServer.writeBytes(request + '\n');

The C# Application is receiving data from the wire as follows.

TcpClient tcpClient = (TcpClient)client;
NetworkStream networkStream = null;
byte[] array = new byte[tcpClient.ReceiveBufferSize];
string text = "";
this.lastTouched = DateTime.Now;
try
{
    networkStream = tcpClient.GetStream();
    do
    {
        int count = networkStream.Read(array, 0, array.Length);
        text += Encoding.ASCII.GetString(array, 0, count);
    }

I have a feeling we are both making mistakes here; but it works on other systems with the same code, and I think that is a coincidence. We are simply seeing an edge-case.

Thoughts?

Upvotes: 2

Views: 1557

Answers (1)

morgano
morgano

Reputation: 17422

Don't use a DataOutputStream, that class is used rather for serialization, you are sending a raw string as long as I understood. Try using the OutputStream directly:

 // for this to use the UTF-8 encoding in <?xml version="1.0" encoding="UTF-8" ?>
clientSocket.getOutputStream().write(request.getBytes("UTF8"));

You need to play around with the encoding you specify in getBytes(...) and also the encoding used in your XML file <?xml version="1.0" encoding="..." ?>

In the C# part:

Are you sure that all the characters in the string are ASCII (you don't have chars like Ñ)? it has ages since last time I wrote something in C#, but it seems you're using ASCII to decode the string, Wouldn't be more appropriate to use another encoding?

Upvotes: 1

Related Questions