AlexeySelivanov
AlexeySelivanov

Reputation: 11

QuickFIX/J — strategy to resend all messages in a sequence (full day recovery)

Our project (server/acceptor side) wants to implement recovery from communication failures. This particularly includes responding to message of MsgType.RESEND_REQUEST type with BeginSeqNo == 1 and EndSeqNo == 0 which is a request for resending all messages. So we want to resend all messages in a sequence from 1 to the latest in response to it.

The feature was initially designed to handle resend request at application level (by detecting message of MsgType.RESEND_REQUEST type in fromAdmin() methods of a class that implements quickfix.Application interface) and send all messages again with proper possible duplicate flags and gap fills.

While testing this we discovered that QuickFIX/J already has a built-in recovery functionality at session level which interferes with our custom implementation. Recovery is implemented by quickfix.Session class, the messages of RESEND_REQUEST type are detected by next() method and are further processed by nextResendRequest() method. nextResendRequest() tries to retrieve and resend messages from MessageStorage or replaces them with Gap Fills if the storage is unable to provide messages.

Now we want to either disable built-in recovery and let our application do the whole job or leverage built-in recovery feature to use our custom message source (Kafka topics).

The problem with disabling is that we see no legitimate way to do it, no settings or customization points. QuickFIX/J session layer seems to be designed for its own automated resend processing only and offers no public API for customization beyond choosing MessageStorage implementation. 

The problem with using built-in mechanism is that at some point it wants all messages as an ArrayList in memory, and this won’t work for 100k-1M volumes, we need an option to stream data.

We see the following options to proceed with the implementation: 

1. Customizing MessageStore for streaming

Provide a MessageStore implementation that relies on data stored in Kafka and retrieves a sequence of messages upon request. The problem with this option is that the MessageStorage API for retrieving messages is synchronous. It expects to receive a full list of messages for resend in one single block (ArrayList), we can't stream messages. This is unacceptable for large volumes because of memory consumption and processing time. 

We wonder if we can somehow adjust MessageStore to use a visitor-like source that feeds messages.

2. Customizing session implementation

Replace Session class with a customized version that skips internal resend processing. There is no straightforward way to do it since:

Some hacking through reflection could be possible but it won't be an easy way. 

3. Bypassing with exceptions

Throw some exceptions in methods available at application level to cancel built-in handling at session level. This may possibly work if combined carefully with RejectMessageOnUnhandledException setting to tolerate failures in message processing but there is a risk that the framework state will be corrupted or other side effects may show up. 

Can you please advise which option is better to use? Or suggest another strategy for implementing recovery of full sequence with 100k-1M messages sourced from Kafka?

Upvotes: 1

Views: 817

Answers (1)

Christoph John
Christoph John

Reputation: 3293

Regarding the specification, you were almost at the correct location, but I have to admit that it is a bit tricky to find. From the page that you linked down to "FIX Application Layer" -> FIX Latest -> Infrastructure-> Specification.

https://www.fixtrading.org/online-specification/business-area-infrastructure/

Or more specifically https://www.fixtrading.org/online-specification/business-area-infrastructure/#category-application-sequencing

Some quotes from there which to me sound like exactly the thing you are looking for:

application sequencing and recovery makes provision for the desired messages – and only the desired messages – to be seamlessly requested and resent while retaining the standard behaviors of the session protocol. It also provides the receiver with the flexibility to put off recovery of application level messages until a slow period or after the market has closed.

Application sequencing is not something that will be used in a normal order routing scenario. It has more relevance in large volume one-way connections in which the receiver would like to have some ability to control the data that is resent after a disconnect or data loss.

It should only be used for managing the flow of data when a FIX connection is used to deliver data in bulk...

However, as the name suggests, application level sequencing needs to be implemented in your application, i.e. there is no built-in support in QFJ for this.

Upvotes: 0

Related Questions