Reputation: 146
I have a XML string coming from Java in base64 encoded format.
PHJvb3Q+PGNoaWxkPiY8L2NoaWxkPjxjaGlsZD48PC9jaGlsZD48Y2hpbGQ+PjwvY2hpbGQ+PGNoaWxkPns8L2NoaWxkPjxjaGlsZD59PC9jaGlsZD4vcm9vdD4=
I decode it using xdmp:base64-decode()
. It gives me output as
<root><child>&</child><child><</child><child>></child><child>{</child><child>}</child>/root>
The output is a string. In order to convert it to XML, I use xdmp:unquote()
, but the special characters present here produces an error.
I also tried using the repair-full
option with xdmp:unquote()
, but it didn't resolve the issue.
Note: I have some special characters present in my actual data those are causing some unwanted errors.
How to handle such type of scenario to insert the XML in MarkLogic?
Upvotes: 1
Views: 1716
Reputation: 66714
The text from that base64 encoded string is not well-formed XML. In addition to the &
and <
not being encoded properly, the closing tag for the root
element is missing <
. At the end of the string, </child>/root>
should be </child></root>
.
As an example of how it might be possible to scrub the text and repair it, the below code will fix up this specific decoded value and then use xdmp:unquote()
to parse as XML:
xdmp:unquote(
replace(
replace(
replace(
xdmp:base64-decode("PHJvb3Q+PGNoaWxkPiY8L2NoaWxkPjxjaGlsZD48PC9jaGlsZD48Y2hpbGQ+PjwvY2hpbGQ+PGNoaWxkPns8L2NoaWxkPjxjaGlsZD59PC9jaGlsZD4vcm9vdD4=")
,"&", "&amp;")
,"><<", ">&lt;<")
,"/root>", "</root>")
)
It produces the following well-formed XML:
<root>
<child>&</child>
<child><</child>
<child>></child>
<child>{</child>
<child>}</child>
</root>
However, this sort of repair is tedious and can become difficult. It is probably best to use tools such as TagSoup to repair the markup and turn it into well-formed XML.
Upvotes: 1