Reputation: 406
I'm running a CloudWatch log insights query on a single log stream that corresponds to a single Python AWS Lambda function. This function logs a unique line corresponding to the key in s3 that it is processing. It logs this line once at the beginning of the invocation. The only condition where it won't log this line is if it fails before it even reads the event.
The query is:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
by datefloor(@timestamp, 1d) as @_datefloor
| sort @_datefloor asc
The two regular expressions in this query will parse the full key of the s3 file being processed. In this particular problem and in general, my understanding is that the count(...) of any quantity minus the count_distinct(...) of the same quantity should always be greater than or equal to zero.
For several of the days in the results, it is a negative number.
I thought I might be misunderstanding the correct usage of datefloor()
, so I tried running the following query:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
The result was -20,347.
At this point the only scenarios I can see are
Upvotes: 0
Views: 1800
Reputation: 406
I have discovered that the count_distinct
function in AWS Log Insights queries doesn't really return a distinct count! As per the documentation
Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.
Apparently I can't just assume that a function returns an accurate result.
Upvotes: 1