Avinash
Avinash

Reputation: 553

Row to column transformation in pandas

I have a dataframe as below

+----+------+------+-----+-----+
| id | year | sell | buy | own |
+----+------+------+-----+-----+
| 1  | 2016 | 9    | 2   | 10  |
| 1  | 2017 | 9    | 0   | 10  |
| 1  | 2018 | 0    | 2   | 10  |
| 2  | 2016 | 7    | 2   | 11  |
| 2  | 2017 | 2    | 0   |  0  |
| 2  | 2018 | 0    | 0   | 18  |
+----+------+------+-----+-----+

I'm trying to transpose rows to columns but instead of aggregate the values, i wanted to keep some letters if it is not 0(S-Sell,B-Buy,O-Own). If there are values for all columns for particular year then i need S_B_O for that year. If there are values only for sell and buy then S_B etc., so Expected output is

+----+-------+------+------+
| ID | 2016  | 2017 | 2018 |
+----+-------+------+------+
| 1  | S_B_O | S_O  | B_O  |
+----+-------+------+------+
| 2  | S_B_O | S    | O    |
+----+-------+------+------+

I'm new to python and have no idea how we can do this. I know only the basic pivot for aggregation as shown below. is it possible? any suggestions would be appreciated.

import pandas as pd
import numpy as np

df=pd.read_excel('Pivot.xlsx')

pivot = pd.pivot_table(df,index=["ID"],columns='year',values ='sell' ,aggfunc = np.sum,fill_value=0)

Dataframe

id,year,sell,buy,own
1,2016,9,2,10
1,2017,9,0,10
1,2018,0,2,10
2,2016,7,2,11
2,2017,2,0,0
2,2018,0,0,18

Upvotes: 1

Views: 109

Answers (3)

Trenton McKinney
Trenton McKinney

Reputation: 62413

  • Cast 'sell', 'buy', and 'own' as Boolean (df.iloc[:, -3:].astype(bool)), multiply the Boolean by the characters ['S', 'B', 'O'], and join them as a combined string.
    • 0 evaluates as False, and the other numbers are True
    • df.iloc[:, -3:] is equivalent to df[['sell', 'buy', 'own']], for this dataframe.
  • The list comprehension removes empty values, otherwise the results would include extra _ (e.g. _S_O or __S).
  • Since this solution uses, .apply, it will be slower than the solution by anky
  • I thought .astype(bool) * ['S', 'B', 'O'] was a neat trick, worth sharing.
import pandas as pd

# sample dataframe
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2], 'year': [2016, 2017, 2018, 2016, 2017, 2018], 'sell': [9, 9, 0, 7, 2, 0], 'buy': [2, 0, 2, 2, 0, 0], 'own': [10, 10, 10, 11, 0, 18]})

# Cast sell, buy, own as Bool, multiple and combine as string
df['str'] = (df.iloc[:, -3:].astype(bool) * ['S', 'B', 'O']).apply(lambda x: '_'.join([v for v in x if v]), axis=1)

# display(df)
   id  year  sell  buy  own    str
0   1  2016     9    2   10  S_B_O
1   1  2017     9    0   10    S_O
2   1  2018     0    2   10    B_O
3   2  2016     7    2   11  S_B_O
4   2  2017     2    0    0      S

# pivot
dfp = df.pivot(index='id', columns='year', values='str')

# display(dfp)
year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

Running in Idle

enter image description here

Upvotes: 4

jottbe
jottbe

Reputation: 4521

that's my favourite pandas operation (unstack) :-) You can do it as follows. Most of the work is to build the strings as you requested:

df['operation']= df['sell'].map(lambda v: ['S'] if v != 0 else [])
indexer= df['buy'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('B'))
indexer= df['own'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('O'))
df['operation']= df['operation'].map(lambda l: '_'.join(l))
df.set_index(['id', 'year'], inplace=True)
df_res= df['operation'].unstack()
df_res

With the test data like:

from io import StringIO
infile= StringIO(
""" id | year | sell | buy | own 
 1  | 2016 | 9    | 2   | 10  
 1  | 2017 | 9    | 0   | 10  
 1  | 2018 | 0    | 2   | 10  
 2  | 2016 | 7    | 2   | 11  
 2  | 2017 | 2    | 0   |  0  
 2  | 2018 | 0    | 0   | 18""")
df= pd.read_csv(infile, sep='|', dtype='int16') #.set_index('Date')
df.head()
df.columns= [col.strip() for col in df.columns]

you get the result:

year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

Upvotes: 1

anky
anky

Reputation: 75080

You can use df.dot with df.pivot here:

u = df[['sell','buy','own']]
(df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v"))

year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

With full formatting;

u = df[['sell','buy','own']]
out = (df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
      .pivot("id","year","v").rename_axis(columns=None).reset_index())
print(out)

   id   2016 2017 2018
0   1  S_B_O  S_O  B_O
1   2  S_B_O    S    O

Upvotes: 7

Related Questions