Ambrose Leung
Ambrose Leung

Reputation: 4215

How do I undo an ingestion in Azure Data Explorer (Kusto)?

Context: I'm following this guide: https://learn.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-client-examples

I'm using IngestFromStorageAsync - I see that the results have an IngestionSourceId (a GUID) - but I don't know what to do with this. (this is not the extent id)

I was assuming that you could use this ID to remove all the records that were imported...

Does anyone know how to undo an ingestion?


Currently, I'm using .show cluster extents to show the extent ids, then I call .drop extent [id]. Is this the right way to undo an ingestion?

Upvotes: 4

Views: 1161

Answers (1)

Yoni L.
Yoni L.

Reputation: 25895

"undo"ing an ingestion is essentially dropping the data that was ingested.

dropping data can be done at the resolution of extents (data shards), and extents can get merged with one another at any given moment (e.g. straight after data was ingested).

if you know there's a chance you'll want to drop the data you've just ingested (and you can't fix the ingestion pipeline that leads to those "erroneous"(?) ingestions), one direction you could follow would be to use extent tags, to be able to identify the extents that were created as part of your ingestion, then drop them.

more information can be found here: https://learn.microsoft.com/en-us/azure/kusto/management/extents-overview. if you do choose to use tags for this purpose (and can't avoid the situations where you need to "undo" your ingestions), please make sure you read the "performance notes" in that doc.


Excerpt from documentation link:

'ingest-by:' extent tags

Tags that start with an ingest-by: prefix can be used to ensure that data is only ingested once. You can issue an ingestIfNotExists property command that prevents the data from being ingested if there already exists an extent with this specific ingest-by: tag. The values for both tags and ingestIfNotExists are arrays of strings, serialized as JSON.

The following example ingests data only once. The 2nd and 3rd commands do nothing:

.ingest ... with (tags = '["ingest-by:2016-02-17"]')

.ingest ... with (ingestIfNotExists = '["2016-02-17"]')

.ingest ... with (ingestIfNotExists = '["2016-02-17"]', tags = '["ingest-by:2016-02-17"]')

[!NOTE] Generally, an ingest command is likely to include both an ingest-by: tag and an ingestIfNotExists property, set to the same value, as shown in the 3rd command above.

[!WARNING]

  • Overusing ingest-by tags isn't recommended.
  • If the pipeline feeding Kusto is known to have data duplications, we recommend that you solve these duplications as much as possible, before ingesting the data into Kusto.
  • Attempting to set a unique ingest-by tag for each ingestion call might result with severe impact on performance.
  • If such tags aren't required for some period of time after the data is ingested, we recommend that you drop extent tags.
    • To drop the tags automatically, you can set an extent tags retention policy.

Upvotes: 4

Related Questions