Reputation: 35
In Java, I am polling a WebSphere MQ message queue, expecting a message of `STRING format, that is composed entirely of XML. Part of this XML will contain bytes to a file attachment (any format: pdf, image, etc) which will then be converted to a blob for storage in an Oracle Db and later retrieval.
The issue I am having is that the known size of example files being sent over end up in my Db with a different size. I am not adding anything to the bytes (as far as I know), and the size appears to be larger directly after I get the message. I cannot determine if I am somehow adding information at retrieve, conversion from bytes
-> String
, or if this is happening on the front end when the sender populates the message.
My code at retrieve of the message:
inboundmsg = new MQMessage();
inboundmsg = getMQMsg(FrontIncomingQueue, gmo);
strLen = inboundmsg.getMessageLength();
strData = new byte[strLen];
ibm_id = inboundmsg.messageId;
inboundmsg.readFully(strData);
inboundmsgContents = new String(strData);
I see a file known to have size 21K go to 28K. A coworker has suggested that charset/encoding may be the issue. I do not specify a charset in the constructor call to String
above, nor in any of the calls to getBytes
when converting back from a string (for other unrelated uses). My default charset is ISO-8859-1. When speaking with the vendor who is initiating the message transfer, I asked her what charset she is using. Her reply:
"I am using the File.WriteAllBytes method in C# - I pass it the path of my file and it writes it to a byte[]. I could not find any documentation on the MSDN about what encoding the function uses. The method creates a byte array and from what I have read online this morning there is no encoding, its just a sequence of 8bit unsigned binary data with no encoding."
Another coworker has suggested that perhaps the MQ charset is the culprit, but my reading of the documentation suggests that MQ charset only affects the behavior of readString
, readLine
, & writeString
.
If I circumvent MQ totally, and populate a byte array using a File Input Stream and a local file, the file size is preserved all the way to Db store, so this definitely appears to be happening at or during message transfer.
Upvotes: 2
Views: 2678
Reputation: 31852
The problem is evident in the wording of the question. You describe a payload that contains arbitrary binary data and also trying to process it as a string. These two things are mutually exclusive.
This appears to be complicated by the vendor not providing valid XML. For example, consider the attachment:
<PdfBytes>iVBORw0KGgoAAAANS … AAAAASUVORK5CYII=</PdfBytes>
If the attachment legitimately contains any XML special character such as <
or >
then the result is invalid XML. If it contains null bytes, some parsers assume they have reached the end of the text and stop parsing there. That is why you normally see any attachment in XML either converted to Base64 for transport or else converted to hexadecimal.
The vendor describes writing raw binary data which suggests that what you are receiving contains non-string characters and therefore should not be sent as string data. If she had described some sort of conversion that would make the attachment XML compliant then string would be appropriate.
Interestingly, a Base64 encoding results in a payload that is 1.33 times larger than the original. Coincidence that 21k * 1.3 = 28k
? One would think that what is received is actually the binary payload in Base64 format. That actually would be parseable as a string and accounts for the difference in file sizes. But it isn't at all what the vendor described doing. she said she's writing "8bit unsigned binary data with no encoding" and not Base64.
So we expect it to fail but not necessarily to result in a larger payload. Consider that WebSphere MQ receiving a message in String
format will attempt to convert it. If the CCSID of the message differs from that requested on the GET
then MQ will attempt a conversion. If the inbound CCSID is UTF-16 or any double-byte character set, certain characters will be expanded from one to two bytes - assuming the conversion doesn't hit invalid binary characters that cause it to fail.
If the two CCSIDs are the same then no conversion is attempted in the MQ classes but there is still an issue in that something has to parse an XML payload that is by definition not valid and therefore subject to unexpected results. If it happens that the binary payload does not contain any XML special characters and the parser doesn't choke on any embedded null bytes, then the parser is going to rather heroic lengths to forgive the non-compliant payload. If it gets to the </PdfBytes>
tag without choking, it may assume that the payload is valid and convert everything between the <PdfBytes>...</PdfBytes>
tags itself. Presumably to Base64.
All of this is conjecture, of course. But in a situation where the payload is unambiguously not string data any attempt to parse it as string data will either fail outright or produce unexpected and potentially bizarre results. You are actually unfortunate that it doesn't fail outright because now there's an expectation that the problem is on your end when it clearly appears to be the vendor's fault.
Assuming that the content of the payload remains unchanged, the vendor should be sending bytes
messages and you should be receiving them as bytes
. That would at least fix the problems MQ is having reconciling the expected format with the actual received format, but it would still be invalid XML. If it works that the vendor sends binary data in a message set to type String
with you processing it as bytes
then count your blessings and use it that way but don't count on it being reliable. Eventually you'll get a payload with an embedded XML special character and then you will have a very bad day.
Ideally, the vendor should know better than to send binary data in an XML payload without converting it first to string and it is up to them to fix it so that it is compliant with the XML spec and reliable.
Please see this MSDN page: XML, SOAP, and Binary Data
Upvotes: 1