Reputation: 426
Consider the following dataframe:
I want to groupby this dataframe so I applied the following logic:
df = pd.read_csv('sample_DF.tsv',sep='\t')
df.groupby('col3',as_index=False).aggregate(lambda x:list(x)).reset_index(drop=True)
But the output doesn't join the lists as list of lists for columns 4 and 5.
When I tried the same with a dummy dataframe like:
df = pd.DataFrame({'P':[['a','b','c'],['x','y'],[1,2,3],['a','b','c'],['x','y']],'Q':['tom','dick','harry','tom','dick'],'R':[10,12,15,10,12]})
df.groupby('R',as_index=False).aggregate(lambda x:list(x))
I get the desired result where, list of lists are returned for columns P and Q.
This means there is something peculiar about the sample_DF.tsv data that is leading to the peculiar behaviour of my command pandas.groupby.
Please let me know what could be the reason.
Upvotes: 0
Views: 150
Reputation: 863301
Because read lists from file saved like strings, is necessary convert them to python object lists with ast.literal_eval
with if-else
statement:
import ast
df = pd.read_csv('sample_DF.tsv', sep="\t", index_col=None, parse_dates=False)
cols = ['TFactor','miRNA']
df[cols] = df[cols].applymap(lambda x: ast.literal_eval(x) if str(x).startswith('[') else [x])
print (df)
Gene stable ID Genes Chromo community TFactor \
0 ENSG00000148584 A1CF 10 com2 [INV]
1 ENSG00000175899 A2M 12 com1 [STAT3, TFCP2, NFKB1]
2 ENSG00000166535 A2ML1 12 com9 [INV]
3 ENSG00000128274 A4GALT 22 com4 [INV]
4 ENSG00000081760 AACS 12 com3 [INV]
miRNA
0 [miR-374-5p/655-3p, miR-758, miR-374c-5p, miR-...
1 [INV]
2 [INV]
3 [INV]
4 [miR-137-3p, miR-137]
Upvotes: 1