aviss
aviss

Reputation: 2439

Working with lists in pandas: apply(pd.Series) doesn't work - an alternative solution?

I have a data set structured like this:

mydic = {'2017-9-11': {'Type1': [15, 115452.0, 3], 'Type2': [47, 176153.0, 4], 'Type3': [0, 0, 0]}, '2017-9-12': {'Type1': [26, 198223.0, 5], 'Type2': [39, 178610.0, 6], 'Type3': [0, 0, 0]}}
df = pd.DataFrame.from_dict(mydic, orient='index')

I need to split values in the lists into different columns and group them by Types. This is what I do:

df_new = df[list(df)].unstack().apply(pd.Series)
df_new.head()

And it works:

                    0       1           2
Type1   2017-9-11   15.0    115452.0    3.0
        2017-9-12   26.0    198223.0    5.0
Type3   2017-9-11   0.0     0.0         0.0
        2017-9-12   0.0     0.0         0.0
Type2   2017-9-11   47.0    176153.0    4.0

BUT when I apply this code to a larger real-life data set it seems like apply(pd.Series) doesn't work and I get just one column 0 with lists of values like this:

                    0    
Type1  2017-9-11    [15, 115452.0, 3]
       2017-9-12    [26, 198223.0, 5]
Type2  2017-9-11    [47, 176153.0, 4]
       2017-9-12    [39, 178610.0, 6]
Type3  2017-9-11            [0, 0, 0]

Can anyone suggest what might be wrong? Or suggest an alternative solution?

Upvotes: 2

Views: 12309

Answers (2)

jezrael
jezrael

Reputation: 862761

It think faster solution is DataFrame constructor, see timings:

s = df.unstack()
df = pd.DataFrame(s.values.tolist(), index=s.index)
print (df)
                  0         1  2
Type1 2017-9-11  15  115452.0  3
      2017-9-12  26  198223.0  5
Type2 2017-9-11  47  176153.0  4
      2017-9-12  39  178610.0  6
Type3 2017-9-11   0       0.0  0
      2017-9-12   0       0.0  0

EDIT:

If values are strings:

df = df.unstack().str.strip('[]').str.split(', ', expand=True).astype(float)
print (df)
                    0         1    2
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

Or is possible convert values to lists:

import ast

s = df.unstack().apply(ast.literal_eval)
df = pd.DataFrame(s.values.tolist(), index=s.index).astype(float)
print (df)
                    0         1    2
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

Upvotes: 2

BENY
BENY

Reputation: 323306

For Data frame , point out witch columns to apply for .

df.unstack().to_frame()[0].apply(pd.Series)

Out[545]: 
                    0         1    2
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

Break done :

df1=df.unstack().to_frame()
df1
Out[546]: 
                                 0
Type2 2017-9-11  [47, 176153.0, 4]
      2017-9-12  [39, 178610.0, 6]
Type1 2017-9-11  [15, 115452.0, 3]
      2017-9-12  [26, 198223.0, 5]
Type3 2017-9-11          [0, 0, 0]
      2017-9-12          [0, 0, 0]

then do apply :

df1[0].apply(pd.Series)

Out[550]: 
                    0         1    2
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

Upvotes: 1

Related Questions