theMoonlitKnight
theMoonlitKnight

Reputation: 451

Strange errors with stream management in ejabberd

I’m building an instant messenger app on iOS that uses ejabberd. I’m currently testing the stream management feature and in particular the resumption that seems to work in most cases. However there is a case I don’t understand, that I can replicate through the following steps, taking in account the settings: resume_timeout: 30, resend_on_timeout: if_offline

<message xmlns="jabber:client" from="clientB@mydomain" to="clientA@mydomain/resourceID" type="error" id="CFBF4583-209A-4453-2567-CCCC7894827E">
   <body>test</body>
   <active xmlns="http://jabber.org/protocol/chatstates" />
   <request xmlns="urn:xmpp:receipts" />
   <error code="503" type="cancel">
       <service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" />
   </error>
</message>

I tried with ejabberd 16.01.

This happens 80% of the time; sometimes messages sent by A are correctly delivered to B on reconnection within the 30 seconds.

My questions are:

Upvotes: 1

Views: 1223

Answers (2)

Holger Wei&#223;
Holger Wei&#223;

Reputation: 21

@xnyhps' response is correct, and I fixed this particular corner case for the next ejabberd release. However, @xnyhps is also correct that there are other corner cases, so if you want reliable message delivery, you should be using XEP-0313. The main feature of XEP-0198 is session resumption.

Upvotes: 1

xnyhps
xnyhps

Reputation: 3316

  • Stream Management acks only indicate that the message has been received by your server. It doesn't imply that the message has been processed or delivered to the specified address. Even if it were delivered to the address, then that device can still return an error for the stanza.
  • This is really just a stab in the dark, but after having a glance over the ejabberd code, this could be what happens:

    1. clientB@mydomain/ResourceB drops their connection, there is now a session awaiting resumption using ResourceB.
    2. Client B reconnects, doesn't resume (because it crashed and lost its state).
    3. Client B binds resource ResourceB again.
    4. Now the server has to terminate the sleeping session that was waiting for resumption because client B requested the same resource.
    5. The server checks whether there are other sessions because it is set to if_offline.
    6. The server sees there is a session (the new session) and therefore chooses to bounce instead of resend.

    So my theory is that if_offline only checks if there are other sessions when queue of unacknowledged messages needs to be handled, not at the time the message was originally received.

Upvotes: 2

Related Questions