Reputation: 158
Given a DataFrame like the following:
col_a col_b col_c
0 0 1 a
1 0 2 b
2 0 3 c
3 1 4 d
4 1 5 e
5 1 6 f
6 2 7 g
7 2 8 h
8 2 9 i
I'm looking for a way to create a list of sublists, where each sublist contains pair-tuples for the col_b
and col_c
values corresponding to the col_a
value. In this example, the required output is:
[[(1, 'a'), (2, 'b'), (3, 'c')],
[(4, 'd'), (5, 'e'), (6, 'f')],
[(7, 'g'), (8, 'h'), (9, 'i')]]
i.e. the first sublist contains the three tuples where col_a
is equal to 0
, and so on.
I managed to do it using a for loop like so:
main_list = []
for i in df["col_a"].unique():
small_df = df.loc[df["col_a"] == i]
sublist = small_df.drop(columns=["col_a"]).to_records(index=False).tolist()
main_list.append(sublist)
But this solution is quite clunky and take an extremely long time to execute on a large df. I'm wondering if there is a faster way of doing this?
Upvotes: 3
Views: 134
Reputation: 71689
Here is one way to approach the problem:
df['p'] = [*zip(df['col_b'], df['col_c'])]
l = df.groupby('col_a')['p'].agg(list).tolist()
print(l)
[[(1, 'a'), (2, 'b'), (3, 'c')],
[(4, 'd'), (5, 'e'), (6, 'f')],
[(7, 'g'), (8, 'h'), (9, 'i')]]
Upvotes: 3
Reputation: 120439
Use groupby
:
totuple = lambda x: [(l, r) for l, r in zip(x['col_b'], x['col_c'])]
out = df.groupby('col_a')[['col_b', 'col_c']].apply(totuple).tolist()
print(out)
# Output
[[(1, 'a'), (2, 'b'), (3, 'c')],
[(4, 'd'), (5, 'e'), (6, 'f')],
[(7, 'g'), (8, 'h'), (9, 'i')]]
Upvotes: 2