Alejandro A
Alejandro A

Reputation: 1190

Python group by is creating list of lists instead of a single list

I have this dataframe:

data = pd.DataFrame({'UserName':['LoveLearn','JakeSanz','LoveLearn'],'Alias':['LL','JS','LL'],'ClassRoom1':['A2','3B','C2'],'ClassRoom2':['B5','E6','D2'],'Points':[1,6,2]})

I want to group by UserName, Alias and sum the points (done) and get a list of all the classrooms a user has attended.

First I filter the classrom columns by name:

classroom_columns = list(data.filter(regex='ClassRoom*').columns)

I group the data:

grouped_data = data.groupby(['UserName','Alias'])

Define this function:

def group_metrics(g_df,class_cols):
    return pd.DataFrame({'TotalPoints':g_df['Points'].sum(),'TotalClassRooms':g_df.apply(lambda x: x[class_cols].values.tolist())})

But after calling the function

group_metrics(grouped_data,classroom_columns)

I get a list of lists on the TotalClassRooms:

    UserName Alias  TotalPoints       TotalClassRooms
0   JakeSanz    JS            6            [[3B, E6]]
1  LoveLearn    LL            3  [[A2, B5], [C2, D2]]

I would want a single list.

Upvotes: 2

Views: 98

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

Can use np.ravel before tolist to flatten the DataFrame into 1D:

import numpy as np


def group_metrics(g_df, class_cols):
    return pd.DataFrame({
        'TotalPoints': g_df['Points'].sum(),
        'TotalClassRooms': g_df.apply(
            lambda x: np.ravel(x[class_cols]).tolist())
    })

Or flatten:

def group_metrics(g_df, class_cols):
    return pd.DataFrame({
        'TotalPoints': g_df['Points'].sum(),
        'TotalClassRooms': g_df.apply(
            lambda x: x[class_cols].values.flatten().tolist())
    })

group_metrics(grouped_data, classroom_columns)
                 TotalPoints   TotalClassRooms
UserName  Alias                               
JakeSanz  JS               6          [3B, E6]
LoveLearn LL               3  [A2, B5, C2, D2]

Upvotes: 1

Related Questions