Reputation: 1471
I have multiple JSON format files which is being pushed to the Azure storage account under a specific container. There are n number of files in the container.
And 4 to 8 nodes which will be accessing the Azure storage container to downloaded the files locally, the download code is written in java.
Since there are n number of files and multiple file accessing the container at the same time, how to avoid the situation that the same file is downloaded by another server?
Example:
Azure container has 1.json, 2.json, 3.json, etc which are > 35 MB size.
batch-process-node1 -> starts downloading 1.json
batch-process-node2 -> starts downloading 2.json
batch-process-node3 -> should not start downloading the 1.json
Is there any logic to be built for each node which has the java process to download the file uniquely? Is there any setting that can be set in the Azure storage container?
--
New to Azure storage blob, any help is appreciated.
Upvotes: 1
Views: 849
Reputation: 1471
Since we are already using Apache camel in the code, we tried to use camel azure-blob component to address the issue. Below is the approach we used, still the race condition is acceptable for our scenario. Camel route started with timer consumer, and producer to get the list of blob from container using below endpoint,
azure-blob://<account>/<container>?credentials=#storagecredentials&blobType=blockBlob&operation=listBlobs
Note: storagecredential is a bean of type StorageCredentialsAccountAndKey class.
Created a java class implementing the Processor of camel, and in process() method, using the exchange.getIn().getBody() => which provides an iterable object with has ListBlobItem.
first i set the meta data of the blob using below endpoint
azure-blob://<account>/<container>/*<blobName>*?credentials=#storagecredentials&blobType=blockBlob&operation=updateBlockBlob&blobMetadata=#blobMetaData1
Note: blobMetaData1 is bean created in the context file.
<util:map id="blobMetaData1" map-class="java.util.HashMap">
<entry key="someKey" value="someValue"/>
</util:map>
Key thing: In this class process method
using the recipientList camel option which invokes the metadata endpoint to update the specific blob.
Then used another processor to form the download blob endpoint
azure-blob://<account>/<container>/*<blobName>*?credentials=#storagecredentials&blobType=blockBlob&operation=getBlob
and using the recipientList to get the processor endpoint from message header.
finally forming another delete endpoint which will delete once its downloaded.
Upvotes: 0