Daniel Anderson
Daniel Anderson

Reputation: 67

How to detect end of an xml and begining of new one using libxml2

We have an old server application written in C++ using libxml2. The server receive xml from the client with some parameters, and the server responds with appropriate data according to the parameters. Now the server can receive many xml commands from the client in succession without closing the socket.

How can I detect the end of one xml and the start of another ?

The protocol does not embed the length of the data, the only thing I have is a stream of data. It worked fine (ahem!) until client started sending multiple commands rapidly!

ex of two commands:

<Command User="Bozo">GetBozoData</Command>
<Command User="Joker">GetJokerPlan</Command>

Could be sent by a client.

For now the code simply search for as a separator and use the data up to the separator to feed libxml2 parser.

this works for simple xml, but as soon as you have comments it start to fall appart. Ex the following does not find the proper delimiter:

<Command User="Bozo">GetBozoData  <!-- trip simple parser </Command>
--> </Command>
<Command User="Joker">GetJokerPlan</Command>

As we are already using libxml2, I was thinking of «ditching» the simple end of xml hack and use libxml2 to find where one xml end and the other start.

Ive simplified it a lot. If client sends commands one after the other it works fine, but if client sends many commands within the same tcp send, then this code only operate on the first command

    void MyFunc()
    {
       std::vector<std::string> chunks;
       // code to fill the vectors with chunks received over tcp
       // ....
       if (chunks.empty() == false)
       {
          xmlParserCtxtPtr ctxt = xmlCreatePushParserCtxt(&SAXHander, nullptr, chunks[0].c_str(), chunks[0].size(), nullptr);
    
          for (auto fragment = chunks.begin() + 1; fragment != chunks.end(); fragment++)
          {
             xmlParseChunk(ctxt, fragment->c_str(), fragment->size(), 0);
             if (startElems == endElems)
                break;
          }
          xmlParseChunk(ctxt, nullptr, 0, 1);
          // Call function to operate on the parsed data!!
          // reset the parser to start parsing fragments as new xml.
          // ....
          // now free the context
          xmlFreeParserCtxt(ctxt);
       }
    }

I've tried another way:

void ReadTcp(socket s)
{
   auto          ctx        = xmlNewParserCtxt();
   xmlSAXHandler saxHandler = MakeSaxHandler();

   auto userData = new UserData;

   userData->s = s;
   ctx->userData  = userData;

   //   auto              buf   = xmlParserInputBufferCreateIO(ssInputReadCallback, ssInputCloseCallback, userData, XML_CHAR_ENCODING_NONE);
   //   auto stream = xmlNewIOInputStream(ctx, buf, XML_CHAR_ENCODING_NONE);

   xmlParserCtxtPtr parser = xmlCreateIOParserCtxt(&saxHandler, userData, TcpReadCallback, TcpCloseCallback, ctx, XML_CHAR_ENCODING_NONE);

   xmlParseDocument(parser);
}

but, like the previous code, if two xml documents are sent it fails to parse the second one!

As suggested I tried adding a fake root node, but DOA! Some customers start their xml with:

when they do, the fake root node makes the xml ill-formed as as node cannot be followed by that xml header! For now, what I do is pushing one char at a time, waiting for the event fired by the parser. when it encounter the closing of the root node the flag is set and the loop stop and can start processing what I have.

Of course this is not very efficient, lucky me the server is not under heavy stress!

I was thinking about having a «crude» parser just to get rid of the in the stream. it would not be perfect, but at least it would boost the parsing speed!

Upvotes: -1

Views: 221

Answers (1)

nwellnhof
nwellnhof

Reputation: 33668

This isn't a trivial problem. I'd really suggest that you change your protocol. But if that's not possible I can try to give you a few hints. I'm assuming that you're using a custom SAX handler, not something derived from the default handler which builds an xmlDoc.

First of all, you'll have to keep the parser context around across multiple requests and initialize it with a "fake" root node. With the push parser approach, it should be enough to call xmlParseChunk(ctxt, "<Root>", ...). With the pull parser, it's more complicated.

Then, with the push parser approach, use the endElement events of the SAX parser to detect completed commands. With the pull parser approach, try to call xmlParseElement instead of xmlParseDocument.

Upvotes: 1

Related Questions