Reputation: 873
I had two dataframes of differing sizes that I would like to do calculations with. The first dataset is a time series. The second dataset are the long-term monthly averages.
The first:
year month snow_depth
0 1979 1 18.322581
1 1979 2 11.535714
2 1979 3 5.322581
3 1979 4 0.300000
4 1979 5 0.000000
5 1979 6 0.000000
6 1979 7 0.000000
7 1979 8 0.000000
8 1979 9 0.000000
9 1979 10 0.322581
10 1979 11 2.366667
11 1979 12 1.290323
12 1980 1 7.838710
13 1980 2 9.758621
14 1980 3 3.967742
15 1980 4 0.000000
16 1980 5 0.000000
17 1980 6 0.000000
18 1980 7 0.000000
19 1980 8 0.000000
20 1980 9 0.000000
21 1980 10 0.000000
22 1980 11 0.100000
23 1980 12 0.000000
24 1981 1 4.290323
25 1981 2 13.678571
26 1981 3 2.967742
27 1981 4 0.066667
28 1981 5 0.000000
29 1981 6 0.000000
30 1981 7 0.000000
31 1981 8 0.000000
32 1981 9 0.000000
33 1981 10 0.387097
34 1981 11 5.000000
35 1981 12 15.741935
And the second:
month
1 14.385199
2 15.312301
3 6.510436
4 0.268908
5 0.005819
6 0.000000
7 0.000000
8 0.000000
9 0.000000
10 0.363676
11 2.350980
12 6.730550
The code I have used is included below:
oname = 'Z:/Dan/SnowStatData/Monthly/Observations/'+str(stations[c])+'Observations.txt'
df = pd.read_table(oname,sep='\t',usecols=(0,1,2),names=['year','month','snow_depth'],na_values=-999)
df = df.replace(np.nan,0)
g = df.groupby(['month'])
oavg = g.aggregate({'snow_depth':np.average})
I've tried simply subtracting oavg from the original dataframe, but the first data value ends up becoming an NaN (because it seems to be going by df's index and not month) and then after index #12, I get NaNs throughout. Is there a simple way to go through each year and subtract out the oavg so that I have a full time series of departures from the mean?
EDIT: This is what I would like in the end:
year month snow_depth
0 1979 1 18.322581 - oavg[month=1]
1 1979 2 11.535714 - oavg[month=2]
2 1979 3 5.322581 - oavg[month=3]
3 1979 4 0.300000 - oavg[month=4]
4 1979 5 0.000000 - oavg[month=5]
5 1979 6 0.000000 - oavg[month=6]
6 1979 7 0.000000 - oavg[month=7]
7 1979 8 0.000000 - oavg[month=8]
8 1979 9 0.000000 - oavg[month=9]
9 1979 10 0.322581 - oavg[month=10]
10 1979 11 2.366667 - oavg[month=11]
11 1979 12 1.290323 - oavg[month=12]
And then repeat for every year thereafter:
12 1980 1 7.838710 - oavg[month=1]
13 1980 2 9.758621 - oavg[month=2]
14 1980 3 3.967742 - oavg[month=3]
etc. etc.
Upvotes: 3
Views: 1788
Reputation: 8493
Give this a try. Using map to pull directly from your series of averages
df["diff"] = df["snow_depth"] - df["month"].map(nameofyourseries)
year month snow_depth diff
0 1979 1 18.322581 3.937382
1 1979 2 11.535714 -3.776587
2 1979 3 5.322581 -1.187855
3 1979 4 0.300000 0.031092
4 1979 5 0.000000 -0.005819
5 1979 6 0.000000 0.000000
6 1979 7 0.000000 0.000000
7 1979 8 0.000000 0.000000
8 1979 9 0.000000 0.000000
9 1979 10 0.322581 -0.041095
10 1979 11 2.366667 0.015687
11 1979 12 1.290323 -5.440227
12 1980 1 7.838710 -6.546489
13 1980 2 9.758621 -5.553680
14 1980 3 3.967742 -2.542694
15 1980 4 0.000000 -0.268908
16 1980 5 0.000000 -0.005819
17 1980 6 0.000000 0.000000
18 1980 7 0.000000 0.000000
19 1980 8 0.000000 0.000000
20 1980 9 0.000000 0.000000
21 1980 10 0.000000 -0.363676
22 1980 11 0.100000 -2.250980
23 1980 12 0.000000 -6.730550
24 1981 1 4.290323 -10.094876
25 1981 2 13.678571 -1.633730
26 1981 3 2.967742 -3.542694
27 1981 4 0.066667 -0.202241
28 1981 5 0.000000 -0.005819
29 1981 6 0.000000 0.000000
30 1981 7 0.000000 0.000000
31 1981 8 0.000000 0.000000
32 1981 9 0.000000 0.000000
33 1981 10 0.387097 0.023421
34 1981 11 5.000000 2.649020
35 1981 12 15.741935 9.011385
Upvotes: 2
Reputation: 109546
I first renamed the column of df2 to 'avg':
df2.columns = ['avg']
Then, merge the average snowfall to df1 and then compute the difference:
df3 = df1.merge(df2, how='left', left_on='month', right_index=True, suffixes=('', '_rhs'))
df3['difference'] = df3['snow_depth'] - df3['avg']
>>> df3
year month snow_depth avg difference
0 1979 1 18.322581 14.385199 3.937382
1 1979 2 11.535714 15.312301 -3.776587
2 1979 3 5.322581 6.510436 -1.187855
3 1979 4 0.300000 0.268908 0.031092
4 1979 5 0.000000 0.005819 -0.005819
5 1979 6 0.000000 0.000000 0.000000
6 1979 7 0.000000 0.000000 0.000000
7 1979 8 0.000000 0.000000 0.000000
8 1979 9 0.000000 0.000000 0.000000
9 1979 10 0.322581 0.363676 -0.041095
10 1979 11 2.366667 2.350980 0.015687
11 1979 12 1.290323 6.730550 -5.440227
12 1980 1 7.838710 14.385199 -6.546489
13 1980 2 9.758621 15.312301 -5.553680
14 1980 3 3.967742 6.510436 -2.542694
15 1980 4 0.000000 0.268908 -0.268908
16 1980 5 0.000000 0.005819 -0.005819
17 1980 6 0.000000 0.000000 0.000000
18 1980 7 0.000000 0.000000 0.000000
19 1980 8 0.000000 0.000000 0.000000
20 1980 9 0.000000 0.000000 0.000000
21 1980 10 0.000000 0.363676 -0.363676
22 1980 11 0.100000 2.350980 -2.250980
23 1980 12 0.000000 6.730550 -6.730550
24 1981 1 4.290323 14.385199 -10.094876
25 1981 2 13.678571 15.312301 -1.633730
26 1981 3 2.967742 6.510436 -3.542694
27 1981 4 0.066667 0.268908 -0.202241
28 1981 5 0.000000 0.005819 -0.005819
29 1981 6 0.000000 0.000000 0.000000
30 1981 7 0.000000 0.000000 0.000000
31 1981 8 0.000000 0.000000 0.000000
32 1981 9 0.000000 0.000000 0.000000
33 1981 10 0.387097 0.363676 0.023421
34 1981 11 5.000000 2.350980 2.649020
35 1981 12 15.741935 6.730550 9.011385
Upvotes: 1