Dhiraj
Dhiraj

Reputation: 3696

Exporting 1TB data out of ADX

I ideally want to setup a pipeline that will export a large amount of data (1TB) out of ADX to ADLS Gen2 in an hourly interval. I believe that ADF copy activity is poor to native export feature of ADX , so I experimented with the on demand export feature (.export command). The ADX cluster and the destination ADLS account are in the same region. But due to sheer volume/size of data , export is always timing out (1 hour cap set by ADX). I have experimented with a few options but so far none of the combinations I tried have returned satisfactory results. I am using default distribution (which I believe is per-shard) for the export but considering the volume of data, I think I will need to scale up number of nodes sufficiently. Should that help? Is there any out of the box solution to export data of this scale out of ADX -- maybe some backend method?

Upvotes: 1

Views: 540

Answers (1)

yifats
yifats

Reputation: 2744

That's right, a single export command is limited to 1h and you cannot increase this limit. The recommendation is to split your data to multiple export commands, such that each exports a subset of the data (you can partition by ingestion_time()). If you run multiple such exports concurrently, you may hit storage throttling limits (depending on number of shards each query will cover), and therefore it's recommended to use multiple storage accounts. When you provide multiple account to a single export command, ADX will distribute the load between them.

Upvotes: 4

Related Questions