Fast way to turn Pandas DataFrame into nested list of tuples

Question

Given a DataFrame like the following:

 col_a col_b col_c

0   0   1   a
1   0   2   b
2   0   3   c
3   1   4   d
4   1   5   e
5   1   6   f
6   2   7   g
7   2   8   h
8   2   9   i

I'm looking for a way to create a list of sublists, where each sublist contains pair-tuples for the col_b and col_c values corresponding to the col_a value. In this example, the required output is:

[[(1, 'a'), (2, 'b'), (3, 'c')],
 [(4, 'd'), (5, 'e'), (6, 'f')],
 [(7, 'g'), (8, 'h'), (9, 'i')]]

i.e. the first sublist contains the three tuples where col_a is equal to 0, and so on.

I managed to do it using a for loop like so:

main_list = []
for i in df["col_a"].unique():
    small_df = df.loc[df["col_a"] == i]
    sublist = small_df.drop(columns=["col_a"]).to_records(index=False).tolist()
    main_list.append(sublist)

But this solution is quite clunky and take an extremely long time to execute on a large df. I'm wondering if there is a faster way of doing this?

Corralien · Accepted Answer

Use groupby:

totuple = lambda x: [(l, r) for l, r in zip(x['col_b'], x['col_c'])]
out = df.groupby('col_a')[['col_b', 'col_c']].apply(totuple).tolist()
print(out)

# Output
[[(1, 'a'), (2, 'b'), (3, 'c')],
 [(4, 'd'), (5, 'e'), (6, 'f')],
 [(7, 'g'), (8, 'h'), (9, 'i')]]

Fast way to turn Pandas DataFrame into nested list of tuples

Answers (2)

Related Questions