Reputation: 783
I have some weather dataset consisting of multiple columns:
StationID, altitude, datetime, longitude, latitude, rainfall
I have multiple stations, which are identified by their respective IDs. The rainfall column has accumulated rainfall amounts. For example, for station X in 10 days, I could have (in mm/day):
station X, 0 0 0 1 5 6 6 8 8 15
For station Y, I could have
*station Y, 0 1 14 14 14 15 18 18 18 20
But what I need are intensity values, that is, the amount from one day minus the other. This would give me the following values for stations X and Y (the first value starts with 0),
station X, 0 0 0 1 4 1 0 2 0 7
station Y, 0 1 13 0 0 1 3 0 0 2
I created a function, which takes in a time series and computes this difference:
def intensity(ts):
ts2 = [0]
for i in range(0,len(ts[:-1])):
ts2.append((ts[i+1]-ts[i]))
return ts2
test = [1,2,3,4,5,10,10,10,20,25]
intensity(test)
Now, my question is: how can I apply this function to the 'rainfall' column in my dataframe for each station group, i.e.:
dfg = df.groupby('station')
and then assign the output to a new column in the dataframe (e.g.: 'rain_intensity' column)?
Upvotes: 1
Views: 595
Reputation: 862611
I think you need:
print (df.groupby('station')['rainfall'].apply(intensity))
But better is diff
with replace NaN
to 0
by fillna
and then if necessary convert to int
:
print (df.groupby('StationID')['rainfall'].diff().fillna(0))
Sample:
df = pd.DataFrame({'rainfall': [0, 0, 0 ,1, 5, 6, 6, 8, 8, 15, 0, 1, 14, 14, 14, 15, 18, 18, 18, 20],
'StationID': ['station X'] * 10 + ['station Y'] * 10})
print (df)
StationID rainfall
0 station X 0
1 station X 0
2 station X 0
3 station X 1
4 station X 5
5 station X 6
6 station X 6
7 station X 8
8 station X 8
9 station X 15
10 station Y 0
11 station Y 1
12 station Y 14
13 station Y 14
14 station Y 14
15 station Y 15
16 station Y 18
17 station Y 18
18 station Y 18
19 station Y 20
def intensity(ts):
ts = ts.tolist()
ts2 = [0]
for i in range(0,len(ts[:-1])):
ts2.append((ts[i+1]-ts[i]))
return pd.Series(ts2)
df['diff1'] = df.groupby('StationID')['rainfall'].apply(intensity).reset_index(drop=True)
df['diff2'] = df.groupby('StationID')['rainfall'].diff().fillna(0).astype(int)
print (df)
StationID rainfall diff1 diff2
0 station X 0 0 0
1 station X 0 0 0
2 station X 0 0 0
3 station X 1 1 1
4 station X 5 4 4
5 station X 6 1 1
6 station X 6 0 0
7 station X 8 2 2
8 station X 8 0 0
9 station X 15 7 7
10 station Y 0 0 0
11 station Y 1 1 1
12 station Y 14 13 13
13 station Y 14 0 0
14 station Y 14 0 0
15 station Y 15 1 1
16 station Y 18 3 3
17 station Y 18 0 0
18 station Y 18 0 0
19 station Y 20 2 2
Upvotes: 1