qwww
qwww

Reputation: 1363

Why pandas unstack is throwing an error?

I am trying to unstack two columns :

cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)

But I am getting an error :

TypeError: can only perform ops with scalar values

What should I do to avoid it?

My original problem : LINK

Upvotes: 2

Views: 606

Answers (1)

jezrael
jezrael

Reputation: 863176

I think need convert columns to list:

cols = res.columns[:31].tolist()

EDIT:

Index contains duplicate entries, cannot reshape

means duplicates, here for first 6 columns, so is impossible create new DataFrame, because first 6 column create new index and 7. column create new column, and for 8. column are 2 values:

    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 

New DataFrame:

 index = xx  s  1  d  f  df
 column = f
 value = 54 

 index = xx  s  1  d  f  df
 column = f
 value = g4 

So solution is aggregate, here working with strings, so need .apply(', '.join):

 index = xx  s  1  d  f  df
 column = f
 value = 54, g4 

Or remove duplicates and keep first or last value of dupes rows by drop_duplicates:

 index = xx  s  1  d  f  df
 column = f
 value = 54
 index = xx  s  1  d  f  df
 column = f
 value = g4

res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
                    1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
                    2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
                    3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
                    4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
                    5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
                    6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'], 
                    7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})


cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 
2  xx  s  1  d  f  df  x  r4
3  xx  s  1  d  f  df  r  43
4  ds  a  s  d  f  ds  f  64
5  ds  a  s  d  f  ds  d  43
6  ds  a  s  d  f  ds  s  se
7  ds  a  s  d  f  ds  1  gf
8  ds  a  s  d  f  ds  3  s3
9  as  t  r  a  2  ds  k  s4

res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)

    0  1  2  3  4   5    1    3    d       f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN     NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43      64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN  54, g4  NaN   43  NaN   r4 <-54, g4

Another solution is remove duplicates:

res = res.drop_duplicates(cols + [6])

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   54  NaN   43  NaN   r4 <- 54
res = res.drop_duplicates(cols + [6], keep='last')

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   g4  NaN   43  NaN   r4 <- g4

Upvotes: 1

Related Questions