Reputation: 31
We are trying to move to cluster with Apache Camel. So far we had it on one node and worked well.
One node: I have readlock strategy set to 'changed' which keeps track of file changes with camelLock file and only when the file has finished downloading, it will be picked up for processing. But camel readlock strategy 'changed' is discouraged in clustering. According to the camel documentation 'idempotent' is recommended. This is what happens when I am testing with 5GB file.
Two nodes: I have readlock strategy to 'idempotent' which distributes files to one of the nodes but camel starts processing the file even before the file has finished downloading.
Is there a way to stop camel from processing even before file has downloaded when readlock strategy is idempotent?
Upvotes: 1
Views: 4190
Reputation: 73
You can use readLock=idempotent-changed
.
idempotent-changed
is for using an idempotentRepository and changed as the combined read-lock. This allows you to use read locks that supports clustering if the idempotent repository implementation supports that.
You can read more about these idempotent-changed
options here: https://camel.apache.org/components/3.13.x/file-component.html
Upvotes: 1
Reputation: 1
We also used readLock=changed
in Docker clustered mode and worked perfectly since we used readLockMinAge
for certain interval.
Upvotes: 0
Reputation: 2937
Even though both "readLock=changed" and "readLock=idempotent" cause the file-consumer to wait, they really address quite different use-cases: while "readLock=changed" guards against the file being incomplete (i.e. still being written by some producer/sender), "readLock=idempotent" guards against a file being read by two consumer routes. It's a bit confusing that they're addressed by the same option.
First, to address the "changed" scenario: can the sender be changed so that it writes the file in one directory and then, when it is done writing, it copies it into the directory being monitored by your file-consumer? If this is under your control, this is a good way of letting the OS handle things instead of trying to deal with it yourself. (This does not address the issue of the multiple readers.) Otherwise, I suggest you revert back to readLock=changed
Next, on multiple readers, one work around is to only have this route run on only one node of your cluster. Sometimes this might defeat the purpose of clustering, but it is quite possible that you're starting up additional nodes to help with some other routes, and you're fine with this particular route running on just one node. It's a bit of a hack to do something like this, because all nodes are no longer equal, but it is still an option to consider. Simplest would be to start one node with some environment property that flags it as the node that will handle file-reading... or some similar approach.
If you do want the route on multiple nodes, you can start by using the option "idempotent=true" but this is not good enough on its own. The option uses a repository, where it records what files have been read before, and the default repository is in-memory (i.e. each node has its own). So, the default implementation is helpful if the same file is actually being received more than once, and you wish to skip it. However, if you want it to work across nodes, you have to use a different repository.
One central repository could be a database. In that case use can use Camel's JDBC or JPA based repositories. Or, you could use something like Hazelcast. See here for your options: http://camel.apache.org/idempotent-consumer.html
Upvotes: 2