Reputation: 4333
I am trying to count distinct sessionIds from a measurement. sessionId
being a tag, I count the distinct entries in a "parent" query, since distinct()
doesn't works on tags.
In the subquery, I use a group by sessionId limit 1
to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1
already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1
is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?
Upvotes: 0
Views: 151
Reputation: 1368
I can confirm that I get the same result. In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count
, containing a number. You can remove the EXACT
modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.
Upvotes: 1