Reputation: 21
We are currently using Azure Data Explorer (ADX) to store sensor data coming from several IOT devices. Data depth is several years.
Our issue is that from time to time a given sensor, or even set of sensors, have issues, resulting in wrong values being ingested.
I am thinking about reworking our ingestion to use extents tags more smartly. We already used tags a bit, but in the end several tags were merged in extents. Therefore if one drops extents by their tags .drop extents <| .show table MyTable extents where tags has "capturedevice01"
(doc) valid data may be dropped alongside false one.
So I am thinking about using drop-by
(doc) tags to limit this phenomenon.
Nevertheless the documentation says : "Avoid excessive use of drop-by tags.".
How much is "excessive"?
For our use case we would have either 20, either 4000 drop-by tags, depending on how we define them. Is either of these values excessive?
Another question is about the drop table extent tags
performances. Any idea how it would behave to drop tens of tags, for an amount of data up to a few GB (compressed, the initial data being 5 to 10 times more)?
Upvotes: 0
Views: 117
Reputation: 25895
Therefore if one performs a .drop table extent tags (doc) valid data may be dropped alongside false one.
are you dropping extents (based on their tags) or extent tags? from your phrasing it sounds like you're dropping extents, in which case the wrong command syntax is being quoted here.
How much is "excessive"?
There are two main reasons for this recommendation:
not placing harsh constraints on the system from merging extents. if you end up doing that, then you'll end up with smaller-than-ideal extents, in which case query performance could be degraded.
not 'inflating' metadata too much - extent tags are stored as part of the database metadata. if the database metadata gets very large, it too may result with some performance degradation.
The key is to not overdo it, and use the available tools according to your scenario.
Another question is about the drop table extent tags performances. Any idea how it would behave to drop tens of tags, for an amount of data up to a few GB (compressed, the initial data being 5 to 10 times more)?
commands to drop extents or to drop tags (I wasn't sure which one you're referring to) both act on metadata and not data. they should be lightweight, as long as you're not processing 10-100s of thousands of shards at a single go.
For our use case we would have around 4000 drop-by tags. Is that excessive?
It's hard to say without knowing -
either way, the guidelines provided above should steer you in the right direction.
Upvotes: 0