Reputation: 553
I have a dataframe as below
+----+------+------+-----+-----+
| id | year | sell | buy | own |
+----+------+------+-----+-----+
| 1 | 2016 | 9 | 2 | 10 |
| 1 | 2017 | 9 | 0 | 10 |
| 1 | 2018 | 0 | 2 | 10 |
| 2 | 2016 | 7 | 2 | 11 |
| 2 | 2017 | 2 | 0 | 0 |
| 2 | 2018 | 0 | 0 | 18 |
+----+------+------+-----+-----+
I'm trying to transpose rows to columns but instead of aggregate the values, i wanted to keep some letters if it is not 0
(S-Sell,B-Buy,O-Own). If there are values for all columns for particular year then i need S_B_O for that year. If there are values only for sell and buy then S_B etc., so Expected output is
+----+-------+------+------+
| ID | 2016 | 2017 | 2018 |
+----+-------+------+------+
| 1 | S_B_O | S_O | B_O |
+----+-------+------+------+
| 2 | S_B_O | S | O |
+----+-------+------+------+
I'm new to python and have no idea how we can do this. I know only the basic pivot for aggregation as shown below. is it possible? any suggestions would be appreciated.
import pandas as pd
import numpy as np
df=pd.read_excel('Pivot.xlsx')
pivot = pd.pivot_table(df,index=["ID"],columns='year',values ='sell' ,aggfunc = np.sum,fill_value=0)
Dataframe
id,year,sell,buy,own
1,2016,9,2,10
1,2017,9,0,10
1,2018,0,2,10
2,2016,7,2,11
2,2017,2,0,0
2,2018,0,0,18
Upvotes: 1
Views: 109
Reputation: 62413
'sell'
, 'buy'
, and 'own'
as Boolean (df.iloc[:, -3:].astype(bool)
), multiply the Boolean by the characters ['S', 'B', 'O']
, and join
them as a combined string.
0
evaluates as False
, and the other numbers are True
df.iloc[:, -3:]
is equivalent to df[['sell', 'buy', 'own']]
, for this dataframe._
(e.g. _S_O
or __S
)..apply
, it will be slower than the solution by anky.astype(bool) * ['S', 'B', 'O']
was a neat trick, worth sharing.import pandas as pd
# sample dataframe
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2], 'year': [2016, 2017, 2018, 2016, 2017, 2018], 'sell': [9, 9, 0, 7, 2, 0], 'buy': [2, 0, 2, 2, 0, 0], 'own': [10, 10, 10, 11, 0, 18]})
# Cast sell, buy, own as Bool, multiple and combine as string
df['str'] = (df.iloc[:, -3:].astype(bool) * ['S', 'B', 'O']).apply(lambda x: '_'.join([v for v in x if v]), axis=1)
# display(df)
id year sell buy own str
0 1 2016 9 2 10 S_B_O
1 1 2017 9 0 10 S_O
2 1 2018 0 2 10 B_O
3 2 2016 7 2 11 S_B_O
4 2 2017 2 0 0 S
# pivot
dfp = df.pivot(index='id', columns='year', values='str')
# display(dfp)
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
Upvotes: 4
Reputation: 4521
that's my favourite pandas operation (unstack) :-) You can do it as follows. Most of the work is to build the strings as you requested:
df['operation']= df['sell'].map(lambda v: ['S'] if v != 0 else [])
indexer= df['buy'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('B'))
indexer= df['own'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('O'))
df['operation']= df['operation'].map(lambda l: '_'.join(l))
df.set_index(['id', 'year'], inplace=True)
df_res= df['operation'].unstack()
df_res
With the test data like:
from io import StringIO
infile= StringIO(
""" id | year | sell | buy | own
1 | 2016 | 9 | 2 | 10
1 | 2017 | 9 | 0 | 10
1 | 2018 | 0 | 2 | 10
2 | 2016 | 7 | 2 | 11
2 | 2017 | 2 | 0 | 0
2 | 2018 | 0 | 0 | 18""")
df= pd.read_csv(infile, sep='|', dtype='int16') #.set_index('Date')
df.head()
df.columns= [col.strip() for col in df.columns]
you get the result:
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
Upvotes: 1
Reputation: 75080
You can use df.dot
with df.pivot
here:
u = df[['sell','buy','own']]
(df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v"))
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
With full formatting;
u = df[['sell','buy','own']]
out = (df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v").rename_axis(columns=None).reset_index())
print(out)
id 2016 2017 2018
0 1 S_B_O S_O B_O
1 2 S_B_O S O
Upvotes: 7