Aggregate on multiple columns in spark dataframe (all combination)

Question

I want to get count of customer based on all combination of columns i have in dataframe.

For eg: - Suppose if I have dataframe with 5 columns.

id, col1, col2, col3, cust_id

I need the count of customer for all combination:

    id, col1, count(cust_id)
    id, col1, col2, count(cust_id)
    id, col1, col3, count(cust_id)
    id, col1, col2, col3, count(cust_id)
    id, col2, count(cust_id)
    id, col2, col3, count(cust_id)

And so on for all permutation and combination.

Its very difficult to do it separately providing all different combination to groupBy function of data frame and then aggregate the count of customer.

Is there any way we can achieve this and then combine all the result to add it in one data frame to that we can write the result in one output file.

to me it looks bit complex, really appreciate if any one can provide any solution. please let me know if any more details are required.

Thanks a lot.

Aggregate on multiple columns in spark dataframe (all combination)

Answers (1)

Related Questions