Reputation: 605
I would like to split data from this pandas dataframe (let's call it df1):
YEAR CODE DIFF
2013 XXXX 5.50
2013 YYYY 8.50
2013 ZZZZ 6.50
2014 XXXX 4.50
2014 YYYY 2.50
2014 ZZZZ 3.50
Such that I create a new dataframe (let's call it df2) that looks like this:
YEAR XXXX_DIFF ZZZZ_DIFF
2013 5.50 6.50
2014 4.50 3.50
I guess I'm grouping by year and splitting the single column results found in DIFF into specific CODE matches. I have tried something like this:
df2 = df1[['YEAR','CODE','DIFF']].query('CODE == "XXXX"')
And I know I can rename columns and drop redundant ones but I wasn't sure how to get the ZZZZ DIFF values across to df2?
Upvotes: 1
Views: 92
Reputation: 75080
Using pivot
+ filter
+ add_suffix
:
out = (df.pivot(*df).filter(['XXXX','ZZZZ']).add_suffix('_DIFF')
.reset_index().rename_axis(None,axis=1))
print(out)
YEAR XXXX_DIFF ZZZZ_DIFF
0 2013 5.5 6.5
1 2014 4.5 3.5
Upvotes: 3
Reputation: 28669
IIUC,
df = (df
#I use the first method because groupby automatically sorts
#the largest will be at the top
#first method gets the first row for each group
.groupby(['YEAR','CODE'],as_index=False)['DIFF'].first()
.query('CODE.isin(["XXXX","ZZZZ"])')
.pivot(index='YEAR', columns = 'CODE'))
#this lumps the multiindex columns into one
#the reversed method flips the positions to match ur expected output
df.columns = ['_'.join(reversed(i)) for i in df.columns.to_flat_index()]
df.reset_index()
YEAR XXXX_DIFF ZZZZ_DIFF
0 2013 5.5 6.5
1 2014 4.5 3.5
Upvotes: 1
Reputation: 19947
You can first set index and unstack, finally remove the unwanted column level and rename.
(
df1.loc[df.CODE!='YYYY']
.set_index(['YEAR', 'CODE'])
.unstack()
.pipe(lambda x: x.set_axis(x.columns.droplevel(0)+'_DIFF',
axis=1, inplace=False))
)
CODE XXXX_DIFF ZZZZ_DIFF
YEAR
2013 5.5 6.5
2014 4.5 3.5
Upvotes: 2