MisterWeary
MisterWeary

Reputation: 605

How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe

I would like to split data from this pandas dataframe (let's call it df1):

YEAR   CODE   DIFF
2013   XXXX   5.50
2013   YYYY   8.50
2013   ZZZZ   6.50
2014   XXXX   4.50
2014   YYYY   2.50
2014   ZZZZ   3.50

Such that I create a new dataframe (let's call it df2) that looks like this:

YEAR   XXXX_DIFF   ZZZZ_DIFF
2013   5.50        6.50
2014   4.50        3.50

I guess I'm grouping by year and splitting the single column results found in DIFF into specific CODE matches. I have tried something like this:

df2 = df1[['YEAR','CODE','DIFF']].query('CODE == "XXXX"')

And I know I can rename columns and drop redundant ones but I wasn't sure how to get the ZZZZ DIFF values across to df2?

Upvotes: 1

Views: 92

Answers (3)

anky
anky

Reputation: 75080

Using pivot + filter + add_suffix:

out = (df.pivot(*df).filter(['XXXX','ZZZZ']).add_suffix('_DIFF')
                   .reset_index().rename_axis(None,axis=1))

print(out)

   YEAR  XXXX_DIFF  ZZZZ_DIFF
0  2013        5.5        6.5
1  2014        4.5        3.5

Upvotes: 3

sammywemmy
sammywemmy

Reputation: 28669

IIUC,

 df = (df
       #I use the first method because groupby automatically sorts
       #the largest will be at the top
       #first method gets the first row for each group
       .groupby(['YEAR','CODE'],as_index=False)['DIFF'].first()
       .query('CODE.isin(["XXXX","ZZZZ"])')
       .pivot(index='YEAR', columns = 'CODE'))

 #this lumps the multiindex columns into one
 #the reversed method flips the positions to match ur expected output
 df.columns = ['_'.join(reversed(i)) for i in df.columns.to_flat_index()]

df.reset_index()

    YEAR    XXXX_DIFF   ZZZZ_DIFF
0   2013        5.5       6.5
1   2014        4.5       3.5

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

You can first set index and unstack, finally remove the unwanted column level and rename.

(
    df1.loc[df.CODE!='YYYY']
    .set_index(['YEAR', 'CODE'])
    .unstack()
    .pipe(lambda x: x.set_axis(x.columns.droplevel(0)+'_DIFF',
                               axis=1, inplace=False))
)


CODE    XXXX_DIFF   ZZZZ_DIFF
YEAR        
2013    5.5         6.5
2014    4.5         3.5

Upvotes: 2

Related Questions