Reputation: 10060
This is a data transformation question involving both pivoting and grouping the data. I looked at a bunch of SE posts on this topic and am almost there, but not quite. So any help is appreciated.
I have data that looks like this.
So you can see the classic database format where there is a well with some WELL ID number, and then there are different CHEMICALS measurements taken on different dates. So a simple pivot would just create new columns with each date as its own column. HOWEVER, the observations are captured at different dates, so I would end up with like 100 columns. Instead I want to create those pivoted columns but then average the values for a year.
GOAL:
So I would like the result to look something as below. Each well should have a separate columns for each Chemical-Year
combination, and the remaining columns like latitude and longitude should remain in the final data.
Well Id TCPR123_2019 TCPC123_2018 TBR123_2019 Latitude Longitude ...
0103041-001 0.0 0.25 0.0 37.72... -122.157...
0105003-001 0.0 0.5 0.3 37.66... -121.84...
WHAT HAVE I DONE SO FAR:
So, I did a number of transformations and using the code:
df.groupby(['WELL ID','CHEMICAL',
pd.Grouper(freq='Y', key='DATE', closed='left')])['RESULTS'].mean()
The results is this.
WELL ID CHEMICAL DATE RESULTS
0103041-001 AS 2011-12-31 4.300000
2015-12-31 2.300000
2019-12-31 0.000000
0105003-001 AS 2011-12-31 0.000000
2014-12-31 0.000000
I just need to get this pivoted and get the remaining columns to show up. I had to use ['RESULTS'].mean()
because otherwise it was averaging all of the columns. I tried to use df.unstack()
but that did not include the other remaining columns. But I am not sure how to combine the pandas.DataFrame.pivot()
with the df.groupby()
operation? I am right at the last step, so any help is appreciated.
Upvotes: 0
Views: 40
Reputation: 323226
Just need to adding the unstack
df['year']=df.DATE.dt.year
s=df.groupby(['WELL ID','CHEMICAL', 'year'])['RESULTS'].mean().unstack([1,2])
s.columns=s.columns.map('{0[0]}_{0[1]}'.format)
Upvotes: 1