pivoting and grouping columns at the same time

Question

This is a data transformation question involving both pivoting and grouping the data. I looked at a bunch of SE posts on this topic and am almost there, but not quite. So any help is appreciated.

I have data that looks like this.

So you can see the classic database format where there is a well with some WELL ID number, and then there are different CHEMICALS measurements taken on different dates. So a simple pivot would just create new columns with each date as its own column. HOWEVER, the observations are captured at different dates, so I would end up with like 100 columns. Instead I want to create those pivoted columns but then average the values for a year.

GOAL:

So I would like the result to look something as below. Each well should have a separate columns for each Chemical-Year combination, and the remaining columns like latitude and longitude should remain in the final data.

Well Id    TCPR123_2019    TCPC123_2018    TBR123_2019  Latitude  Longitude   ...
0103041-001    0.0         0.25            0.0        37.72...  -122.157...
0105003-001    0.0         0.5             0.3        37.66...  -121.84...

WHAT HAVE I DONE SO FAR:

So, I did a number of transformations and using the code:

df.groupby(['WELL ID','CHEMICAL', 
    pd.Grouper(freq='Y', key='DATE', closed='left')])['RESULTS'].mean()

The results is this.

WELL ID      CHEMICAL  DATE      RESULTS
0103041-001  AS          2011-12-31      4.300000
                         2015-12-31      2.300000
                         2019-12-31      0.000000
0105003-001  AS          2011-12-31      0.000000
                         2014-12-31      0.000000

I just need to get this pivoted and get the remaining columns to show up. I had to use ['RESULTS'].mean() because otherwise it was averaging all of the columns. I tried to use df.unstack() but that did not include the other remaining columns. But I am not sure how to combine the pandas.DataFrame.pivot() with the df.groupby() operation? I am right at the last step, so any help is appreciated.

BENY · Accepted Answer

Just need to adding the unstack

df['year']=df.DATE.dt.year
s=df.groupby(['WELL ID','CHEMICAL', 'year'])['RESULTS'].mean().unstack([1,2])
s.columns=s.columns.map('{0[0]}_{0[1]}'.format)

pivoting and grouping columns at the same time

Answers (1)

Related Questions