How to get distinct count in aggregate

Question

I simply want to get distinct_count aggregation.

I have this code:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max, bid_id: :count)

I want something like this:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max, bid_id: :distinct_count)

I know there is no statistical method like that implemented yet, is there any other way?

janpeterka · Accepted Answer

I found one way to do this:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max,
                                  bid_id: lambda{ |x| x.uniq.size })

or maybe better yet:

data_frame = data_frame.group_by(:job_id)
                       .aggregate(job_id: :max,
                                  bid_id: ->(x) { x.uniq.size })

I am not sure if it is the right way, but it seems to work.

This pandas solution helped me.

How to get distinct count in aggregate

Answers (1)

Related Questions