Reputation: 3532
I've got a very simple problem, but I can't seem to get it right. Consider this dataframe
df = pd.DataFrame({'group' :
['A', 'A', 'A', 'B', 'B'], 'time' : [20, 21, 22, 20, 21],
'price' : [3.1, 3.5, 3.0, 2.3, 2.1]})
group price time
0 A 3.1 20
1 A 3.5 21
2 A 3.0 22
3 B 2.3 20
4 B 2.1 21
Now I want to take the standard deviation of the price of each group, but conditional on it being before time 22 (let's call it early_std
). I want to then create a variable with that information.
The expected result is
group price time early_std
A 3.1 20 0.282843
A 3.5 21 0.282843
A 3.0 22 0.282843
B 2.3 20 0.141421
B 2.1 21 0.141421
This is what I tried:
df['early_std'] = df[df.time < 22].groupby('group').\
price.transform(lambda x : x.std())
This almost works but it gives a missing value on time = 22
:
group price time early_std
0 A 3.1 20 0.282843
1 A 3.5 21 0.282843
2 A 3.0 22 NaN
3 B 2.3 20 0.141421
4 B 2.1 21 0.141421
I also tried with apply and I think it works, but I need to reset the index, which is something I'd rather avoid (I have a large dataset and I need to do this repeatedly)
early_std2 = df[df.time < 22].groupby('group').price.std()
df.set_index('group', inplace=True)
df['early_std2'] = early_std2
price time early_std early_std2
group
A 3.1 20 0.282843 0.282843
A 3.5 21 0.282843 0.282843
A 3.0 22 NaN 0.282843
B 2.3 20 0.141421 0.141421
B 2.1 21 0.141421 0.141421
Thanks!
Upvotes: 0
Views: 1586
Reputation: 10302
It looks like you only need to add fillna()
to your first code to expand the std
values:
df['early_std'] = df[df.time < 22].groupby('group')['price'].transform(pd.Series.std)
df['early_std'] = df.groupby('group')['early_std'].apply(lambda x: x.fillna(x.max()))
df
To get:
group price time early_std
0 A 3.1 20 0.283
1 A 3.5 21 0.283
2 A 3.0 22 0.283
3 B 2.3 20 0.141
4 B 2.1 21 0.141
EDIT: I have changed ffill
to a more general fillna
, but you could also use chained .bfill().ffill()
to achieve the same result.
Upvotes: 1
Reputation: 924
Your second approach is very close to what you are trying to achieve. This may not be the most efficient method but it worked for me:
df['early_std'] = 0
for index,value in early_std2.iteritems():
df.early_std[df.group==index] = value
Upvotes: 1