Reputation: 2439
I have a data set structured like this:
mydic = {'2017-9-11': {'Type1': [15, 115452.0, 3], 'Type2': [47, 176153.0, 4], 'Type3': [0, 0, 0]}, '2017-9-12': {'Type1': [26, 198223.0, 5], 'Type2': [39, 178610.0, 6], 'Type3': [0, 0, 0]}}
df = pd.DataFrame.from_dict(mydic, orient='index')
I need to split values in the lists into different columns and group them by Types. This is what I do:
df_new = df[list(df)].unstack().apply(pd.Series)
df_new.head()
And it works:
0 1 2
Type1 2017-9-11 15.0 115452.0 3.0
2017-9-12 26.0 198223.0 5.0
Type3 2017-9-11 0.0 0.0 0.0
2017-9-12 0.0 0.0 0.0
Type2 2017-9-11 47.0 176153.0 4.0
BUT when I apply this code to a larger real-life data set it seems like apply(pd.Series)
doesn't work and I get just one column 0 with lists of values like this:
0
Type1 2017-9-11 [15, 115452.0, 3]
2017-9-12 [26, 198223.0, 5]
Type2 2017-9-11 [47, 176153.0, 4]
2017-9-12 [39, 178610.0, 6]
Type3 2017-9-11 [0, 0, 0]
Can anyone suggest what might be wrong? Or suggest an alternative solution?
Upvotes: 2
Views: 12309
Reputation: 862761
It think faster solution is DataFrame
constructor, see timings:
s = df.unstack()
df = pd.DataFrame(s.values.tolist(), index=s.index)
print (df)
0 1 2
Type1 2017-9-11 15 115452.0 3
2017-9-12 26 198223.0 5
Type2 2017-9-11 47 176153.0 4
2017-9-12 39 178610.0 6
Type3 2017-9-11 0 0.0 0
2017-9-12 0 0.0 0
EDIT:
If values are strings:
df = df.unstack().str.strip('[]').str.split(', ', expand=True).astype(float)
print (df)
0 1 2
Type1 2017-9-11 15.0 115452.0 3.0
2017-9-12 26.0 198223.0 5.0
Type2 2017-9-11 47.0 176153.0 4.0
2017-9-12 39.0 178610.0 6.0
Type3 2017-9-11 0.0 0.0 0.0
2017-9-12 0.0 0.0 0.0
Or is possible convert values to list
s:
import ast
s = df.unstack().apply(ast.literal_eval)
df = pd.DataFrame(s.values.tolist(), index=s.index).astype(float)
print (df)
0 1 2
Type1 2017-9-11 15.0 115452.0 3.0
2017-9-12 26.0 198223.0 5.0
Type2 2017-9-11 47.0 176153.0 4.0
2017-9-12 39.0 178610.0 6.0
Type3 2017-9-11 0.0 0.0 0.0
2017-9-12 0.0 0.0 0.0
Upvotes: 2
Reputation: 323306
For Data frame , point out witch columns to apply for .
df.unstack().to_frame()[0].apply(pd.Series)
Out[545]:
0 1 2
Type2 2017-9-11 47.0 176153.0 4.0
2017-9-12 39.0 178610.0 6.0
Type1 2017-9-11 15.0 115452.0 3.0
2017-9-12 26.0 198223.0 5.0
Type3 2017-9-11 0.0 0.0 0.0
2017-9-12 0.0 0.0 0.0
Break done :
df1=df.unstack().to_frame()
df1
Out[546]:
0
Type2 2017-9-11 [47, 176153.0, 4]
2017-9-12 [39, 178610.0, 6]
Type1 2017-9-11 [15, 115452.0, 3]
2017-9-12 [26, 198223.0, 5]
Type3 2017-9-11 [0, 0, 0]
2017-9-12 [0, 0, 0]
then do apply
:
df1[0].apply(pd.Series)
Out[550]:
0 1 2
Type2 2017-9-11 47.0 176153.0 4.0
2017-9-12 39.0 178610.0 6.0
Type1 2017-9-11 15.0 115452.0 3.0
2017-9-12 26.0 198223.0 5.0
Type3 2017-9-11 0.0 0.0 0.0
2017-9-12 0.0 0.0 0.0
Upvotes: 1