pandas groupby aggregate for columns having lists of items returns string and not list

Question

Consider the following dataframe:

I want to groupby this dataframe so I applied the following logic:

df = pd.read_csv('sample_DF.tsv',sep='	')

df.groupby('col3',as_index=False).aggregate(lambda x:list(x)).reset_index(drop=True)

But the output doesn't join the lists as list of lists for columns 4 and 5.

When I tried the same with a dummy dataframe like:

df = pd.DataFrame({'P':[['a','b','c'],['x','y'],[1,2,3],['a','b','c'],['x','y']],'Q':['tom','dick','harry','tom','dick'],'R':[10,12,15,10,12]})

df.groupby('R',as_index=False).aggregate(lambda x:list(x))

I get the desired result where, list of lists are returned for columns P and Q.

This means there is something peculiar about the sample_DF.tsv data that is leading to the peculiar behaviour of my command pandas.groupby.

Please let me know what could be the reason.

jezrael · Accepted Answer

Because read lists from file saved like strings, is necessary convert them to python object lists with ast.literal_eval with if-else statement:

import ast

df = pd.read_csv('sample_DF.tsv', sep="	", index_col=None, parse_dates=False)

cols = ['TFactor','miRNA']
df[cols] = df[cols].applymap(lambda x: ast.literal_eval(x) if str(x).startswith('[') else [x])
print (df)

    Gene stable ID   Genes  Chromo community                TFactor  \
0  ENSG00000148584    A1CF      10      com2                  [INV]   
1  ENSG00000175899     A2M      12      com1  [STAT3, TFCP2, NFKB1]   
2  ENSG00000166535   A2ML1      12      com9                  [INV]   
3  ENSG00000128274  A4GALT      22      com4                  [INV]   
4  ENSG00000081760    AACS      12      com3                  [INV]   

                                               miRNA  
0  [miR-374-5p/655-3p, miR-758, miR-374c-5p, miR-...  
1                                              [INV]  
2                                              [INV]  
3                                              [INV]  
4                              [miR-137-3p, miR-137]

pandas groupby aggregate for columns having lists of items returns string and not list

Answers (1)

Related Questions