senko
senko

Reputation: 157

Pandas how to make group by without losing other column information

I've a df like this

      source  destination  weight  partition
0          1            2     193          7
1          1           22    2172          7
2          2            3     188          7
3          2            1     193          7
4          2            4     403          7
...      ...          ...     ...        ...
8865    3351         3352     719          4
8866    3351         2961    6009          4
8867    3352         3351     719          4
8868    3353         1540     128          2
8869    3353         1377     198          2

which is a edge list of a graph, partition is source vertex's partition information. I want to groupby source with their destinations to find all neighbours I tried this

group = result.groupby("source")["destination"].apply(list).reset_index(name="neighbours")

and result is:

      source          neighbours
0          1             [2, 22]
1          2           [3, 1, 4]
2          3                 [2]
3          4          [21, 2, 5]
4          5              [4, 8]
...      ...                 ...
3348    3349  [3350, 3345, 3324]
3349    3350              [3349]
3350    3351  [2896, 3352, 2961]
3351    3352              [3351]
3352    3353        [1540, 1377]

but here as you can see I'm losing partition information is there a way to keep this column as well?

Upvotes: 1

Views: 532

Answers (1)

jezrael
jezrael

Reputation: 863226

If need grouping by 2 columns use:

group = (result.groupby(["source", "partition"])["destination"]
               .apply(list)
               .reset_index(name="neighbours"))

If need first value of column partition aggregate by GroupBy.agg with first:

group = (result.groupby("source")
               .agg({'partition':'first',"destination": list})
               .reset_index(name="neighbours"))

Upvotes: 0

Related Questions