Reputation: 1363
I am trying to unstack two columns :
cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)
But I am getting an error :
TypeError: can only perform ops with scalar values
What should I do to avoid it?
My original problem : LINK
Upvotes: 2
Views: 606
Reputation: 863176
I think need convert columns to list:
cols = res.columns[:31].tolist()
EDIT:
Index contains duplicate entries, cannot reshape
means duplicates, here for first 6 columns, so is impossible create new DataFrame, because first 6 column create new index and 7. column create new column, and for 8. column are 2 values:
0 1 2 3 4 5 6 7
0 xx s 1 d f df f 54
1 xx s 1 d f df f g4
New DataFrame:
index = xx s 1 d f df
column = f
value = 54
index = xx s 1 d f df
column = f
value = g4
So solution is aggregate, here working with strings, so need .apply(', '.join)
:
index = xx s 1 d f df
column = f
value = 54, g4
Or remove duplicates and keep first or last value of dupes rows by drop_duplicates
:
index = xx s 1 d f df
column = f
value = 54
index = xx s 1 d f df
column = f
value = g4
res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'],
7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})
cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
0 1 2 3 4 5 6 7
0 xx s 1 d f df f 54
1 xx s 1 d f df f g4
2 xx s 1 d f df x r4
3 xx s 1 d f df r 43
4 ds a s d f ds f 64
5 ds a s d f ds d 43
6 ds a s d f ds s se
7 ds a s d f ds 1 gf
8 ds a s d f ds 3 s3
9 as t r a 2 ds k s4
res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN 54, g4 NaN 43 NaN r4 <-54, g4
Another solution is remove duplicates:
res = res.drop_duplicates(cols + [6])
res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN 54 NaN 43 NaN r4 <- 54
res = res.drop_duplicates(cols + [6], keep='last')
res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN g4 NaN 43 NaN r4 <- g4
Upvotes: 1