How to groupby and then aggregate on multiple columns

Question

I am using Pandas on Spark. I need to groupby A and B and then aggregate to return a list of map where keys are C and values are D Sample input:

         A      B        C           D
0        7 201806851 0006378110  2223982011
1        7  6378110 0006378110  2223982011
2        7 201806851  201806851  20972475011
3        7  6378110  201806851  20972475011

Sample output:

         A      B        C
0        7  6378110 [[0006378110, 2223982011], [201806851, 20972475011]]
1        7 201806851 [[0006378110, 2223982011], [201806851, 20972475011]]

This is my code. It's giving the error, assert len(key) == len(that_column_labels) AssertionError on the first line. Any idea?

seed_data["C"] = seed_data[["C", "D"]].to_dict('records')
seed_data = (seed_data
                     .groupby(["A", "B"])["C"]
                     .apply(list).reset_index(name="C"))

Tried a few things, like extracting columns C and D into a separate dataframe, convert to dict and then using it as an aggregate column. But getting assertion error.

How to groupby and then aggregate on multiple columns

Answers (1)

Related Questions