How to detect end of an xml and begining of new one using libxml2

Question

We have an old server application written in C++ using libxml2. The server receive xml from the client with some parameters, and the server responds with appropriate data according to the parameters. Now the server can receive many xml commands from the client in succession without closing the socket.

How can I detect the end of one xml and the start of another ?

The protocol does not embed the length of the data, the only thing I have is a stream of data. It worked fine (ahem!) until client started sending multiple commands rapidly!

ex of two commands:

GetBozoData
GetJokerPlan

Could be sent by a client.

For now the code simply search for as a separator and use the data up to the separator to feed libxml2 parser.

this works for simple xml, but as soon as you have comments it start to fall appart. Ex the following does not find the proper delimiter:

GetBozoData   
GetJokerPlan

As we are already using libxml2, I was thinking of «ditching» the simple end of xml hack and use libxml2 to find where one xml end and the other start.

Ive simplified it a lot. If client sends commands one after the other it works fine, but if client sends many commands within the same tcp send, then this code only operate on the first command

    void MyFunc()
    {
       std::vector chunks;
       // code to fill the vectors with chunks received over tcp
       // ....
       if (chunks.empty() == false)
       {
          xmlParserCtxtPtr ctxt = xmlCreatePushParserCtxt(&SAXHander, nullptr, chunks[0].c_str(), chunks[0].size(), nullptr);
    
          for (auto fragment = chunks.begin() + 1; fragment != chunks.end(); fragment++)
          {
             xmlParseChunk(ctxt, fragment->c_str(), fragment->size(), 0);
             if (startElems == endElems)
                break;
          }
          xmlParseChunk(ctxt, nullptr, 0, 1);
          // Call function to operate on the parsed data!!
          // reset the parser to start parsing fragments as new xml.
          // ....
          // now free the context
          xmlFreeParserCtxt(ctxt);
       }
    }

I've tried another way:

void ReadTcp(socket s)
{
   auto          ctx        = xmlNewParserCtxt();
   xmlSAXHandler saxHandler = MakeSaxHandler();

   auto userData = new UserData;

   userData->s = s;
   ctx->userData  = userData;

   //   auto              buf   = xmlParserInputBufferCreateIO(ssInputReadCallback, ssInputCloseCallback, userData, XML_CHAR_ENCODING_NONE);
   //   auto stream = xmlNewIOInputStream(ctx, buf, XML_CHAR_ENCODING_NONE);

   xmlParserCtxtPtr parser = xmlCreateIOParserCtxt(&saxHandler, userData, TcpReadCallback, TcpCloseCallback, ctx, XML_CHAR_ENCODING_NONE);

   xmlParseDocument(parser);
}

but, like the previous code, if two xml documents are sent it fails to parse the second one!

As suggested I tried adding a fake root node, but DOA! Some customers start their xml with:

when they do, the fake root node makes the xml ill-formed as as node cannot be followed by that xml header! For now, what I do is pushing one char at a time, waiting for the event fired by the parser. when it encounter the closing of the root node the flag is set and the loop stop and can start processing what I have.

Of course this is not very efficient, lucky me the server is not under heavy stress!

I was thinking about having a «crude» parser just to get rid of the in the stream. it would not be perfect, but at least it would boost the parsing speed!

How to detect end of an xml and begining of new one using libxml2

Answers (1)

Related Questions