Reputation: 157
I've a df like this
source destination weight partition
0 1 2 193 7
1 1 22 2172 7
2 2 3 188 7
3 2 1 193 7
4 2 4 403 7
... ... ... ... ...
8865 3351 3352 719 4
8866 3351 2961 6009 4
8867 3352 3351 719 4
8868 3353 1540 128 2
8869 3353 1377 198 2
which is a edge list of a graph, partition is source vertex's partition information. I want to groupby source with their destinations to find all neighbours I tried this
group = result.groupby("source")["destination"].apply(list).reset_index(name="neighbours")
and result is:
source neighbours
0 1 [2, 22]
1 2 [3, 1, 4]
2 3 [2]
3 4 [21, 2, 5]
4 5 [4, 8]
... ... ...
3348 3349 [3350, 3345, 3324]
3349 3350 [3349]
3350 3351 [2896, 3352, 2961]
3351 3352 [3351]
3352 3353 [1540, 1377]
but here as you can see I'm losing partition information is there a way to keep this column as well?
Upvotes: 1
Views: 532
Reputation: 863226
If need grouping by 2 columns use:
group = (result.groupby(["source", "partition"])["destination"]
.apply(list)
.reset_index(name="neighbours"))
If need first value of column partition
aggregate by GroupBy.agg
with first
:
group = (result.groupby("source")
.agg({'partition':'first',"destination": list})
.reset_index(name="neighbours"))
Upvotes: 0