DJV
DJV

Reputation: 873

Python Pandas: Calculations with Two Different Size Dataframes

I had two dataframes of differing sizes that I would like to do calculations with. The first dataset is a time series. The second dataset are the long-term monthly averages.

The first:

     year  month  snow_depth
0    1979      1   18.322581
1    1979      2   11.535714
2    1979      3    5.322581
3    1979      4    0.300000
4    1979      5    0.000000
5    1979      6    0.000000
6    1979      7    0.000000
7    1979      8    0.000000
8    1979      9    0.000000
9    1979     10    0.322581
10   1979     11    2.366667
11   1979     12    1.290323
12   1980      1    7.838710
13   1980      2    9.758621
14   1980      3    3.967742
15   1980      4    0.000000
16   1980      5    0.000000
17   1980      6    0.000000
18   1980      7    0.000000
19   1980      8    0.000000
20   1980      9    0.000000
21   1980     10    0.000000
22   1980     11    0.100000
23   1980     12    0.000000
24   1981      1    4.290323
25   1981      2   13.678571
26   1981      3    2.967742
27   1981      4    0.066667
28   1981      5    0.000000
29   1981      6    0.000000
30   1981      7    0.000000
31   1981      8    0.000000
32   1981      9    0.000000
33   1981     10    0.387097
34   1981     11    5.000000
35   1981     12   15.741935

And the second:

month            
1       14.385199
2       15.312301
3        6.510436
4        0.268908
5        0.005819
6        0.000000
7        0.000000
8        0.000000
9        0.000000
10       0.363676
11       2.350980
12       6.730550

The code I have used is included below:

oname = 'Z:/Dan/SnowStatData/Monthly/Observations/'+str(stations[c])+'Observations.txt'

df = pd.read_table(oname,sep='\t',usecols=(0,1,2),names=['year','month','snow_depth'],na_values=-999)
df = df.replace(np.nan,0)
g = df.groupby(['month'])
oavg = g.aggregate({'snow_depth':np.average})

I've tried simply subtracting oavg from the original dataframe, but the first data value ends up becoming an NaN (because it seems to be going by df's index and not month) and then after index #12, I get NaNs throughout. Is there a simple way to go through each year and subtract out the oavg so that I have a full time series of departures from the mean?

EDIT: This is what I would like in the end:

     year  month  snow_depth
0    1979      1   18.322581 - oavg[month=1]
1    1979      2   11.535714 - oavg[month=2]
2    1979      3    5.322581 - oavg[month=3]
3    1979      4    0.300000 - oavg[month=4]
4    1979      5    0.000000 - oavg[month=5]
5    1979      6    0.000000 - oavg[month=6]
6    1979      7    0.000000 - oavg[month=7]
7    1979      8    0.000000 - oavg[month=8]
8    1979      9    0.000000 - oavg[month=9]
9    1979     10    0.322581 - oavg[month=10]
10   1979     11    2.366667 - oavg[month=11]
11   1979     12    1.290323 - oavg[month=12]

And then repeat for every year thereafter:

12   1980      1    7.838710 - oavg[month=1]
13   1980      2    9.758621 - oavg[month=2]
14   1980      3    3.967742 - oavg[month=3]

etc. etc.

Upvotes: 3

Views: 1788

Answers (2)

Bob Haffner
Bob Haffner

Reputation: 8493

Give this a try. Using map to pull directly from your series of averages

df["diff"] = df["snow_depth"] - df["month"].map(nameofyourseries)

        year  month  snow_depth       diff
0   1979      1   18.322581   3.937382
1   1979      2   11.535714  -3.776587
2   1979      3    5.322581  -1.187855
3   1979      4    0.300000   0.031092
4   1979      5    0.000000  -0.005819
5   1979      6    0.000000   0.000000
6   1979      7    0.000000   0.000000
7   1979      8    0.000000   0.000000
8   1979      9    0.000000   0.000000
9   1979     10    0.322581  -0.041095
10  1979     11    2.366667   0.015687
11  1979     12    1.290323  -5.440227
12  1980      1    7.838710  -6.546489
13  1980      2    9.758621  -5.553680
14  1980      3    3.967742  -2.542694
15  1980      4    0.000000  -0.268908
16  1980      5    0.000000  -0.005819
17  1980      6    0.000000   0.000000
18  1980      7    0.000000   0.000000
19  1980      8    0.000000   0.000000
20  1980      9    0.000000   0.000000
21  1980     10    0.000000  -0.363676
22  1980     11    0.100000  -2.250980
23  1980     12    0.000000  -6.730550
24  1981      1    4.290323 -10.094876
25  1981      2   13.678571  -1.633730
26  1981      3    2.967742  -3.542694
27  1981      4    0.066667  -0.202241
28  1981      5    0.000000  -0.005819
29  1981      6    0.000000   0.000000
30  1981      7    0.000000   0.000000
31  1981      8    0.000000   0.000000
32  1981      9    0.000000   0.000000
33  1981     10    0.387097   0.023421
34  1981     11    5.000000   2.649020
35  1981     12   15.741935   9.011385

Upvotes: 2

Alexander
Alexander

Reputation: 109546

I first renamed the column of df2 to 'avg':

df2.columns = ['avg']

Then, merge the average snowfall to df1 and then compute the difference:

df3 = df1.merge(df2, how='left', left_on='month', right_index=True, suffixes=('', '_rhs'))

df3['difference'] = df3['snow_depth'] - df3['avg']

>>> df3
    year  month  snow_depth        avg  difference
0   1979      1   18.322581  14.385199    3.937382
1   1979      2   11.535714  15.312301   -3.776587
2   1979      3    5.322581   6.510436   -1.187855
3   1979      4    0.300000   0.268908    0.031092
4   1979      5    0.000000   0.005819   -0.005819
5   1979      6    0.000000   0.000000    0.000000
6   1979      7    0.000000   0.000000    0.000000
7   1979      8    0.000000   0.000000    0.000000
8   1979      9    0.000000   0.000000    0.000000
9   1979     10    0.322581   0.363676   -0.041095
10  1979     11    2.366667   2.350980    0.015687
11  1979     12    1.290323   6.730550   -5.440227
12  1980      1    7.838710  14.385199   -6.546489
13  1980      2    9.758621  15.312301   -5.553680
14  1980      3    3.967742   6.510436   -2.542694
15  1980      4    0.000000   0.268908   -0.268908
16  1980      5    0.000000   0.005819   -0.005819
17  1980      6    0.000000   0.000000    0.000000
18  1980      7    0.000000   0.000000    0.000000
19  1980      8    0.000000   0.000000    0.000000
20  1980      9    0.000000   0.000000    0.000000
21  1980     10    0.000000   0.363676   -0.363676
22  1980     11    0.100000   2.350980   -2.250980
23  1980     12    0.000000   6.730550   -6.730550
24  1981      1    4.290323  14.385199  -10.094876
25  1981      2   13.678571  15.312301   -1.633730
26  1981      3    2.967742   6.510436   -3.542694
27  1981      4    0.066667   0.268908   -0.202241
28  1981      5    0.000000   0.005819   -0.005819
29  1981      6    0.000000   0.000000    0.000000
30  1981      7    0.000000   0.000000    0.000000
31  1981      8    0.000000   0.000000    0.000000
32  1981      9    0.000000   0.000000    0.000000
33  1981     10    0.387097   0.363676    0.023421
34  1981     11    5.000000   2.350980    2.649020
35  1981     12   15.741935   6.730550    9.011385

Upvotes: 1

Related Questions