cd98
cd98

Reputation: 3532

Conditional transform on pandas

I've got a very simple problem, but I can't seem to get it right. Consider this dataframe

df = pd.DataFrame({'group' : 
               ['A', 'A', 'A', 'B', 'B'], 'time' : [20, 21, 22, 20, 21],
               'price' : [3.1, 3.5, 3.0, 2.3, 2.1]})


   group price time
0   A   3.1     20
1   A   3.5     21
2   A   3.0     22
3   B   2.3     20
4   B   2.1     21

Now I want to take the standard deviation of the price of each group, but conditional on it being before time 22 (let's call it early_std). I want to then create a variable with that information.

The expected result is

group price time    early_std

A   3.1     20      0.282843
A   3.5     21      0.282843
A   3.0     22      0.282843
B   2.3     20      0.141421
B   2.1     21      0.141421

This is what I tried:

df['early_std'] = df[df.time < 22].groupby('group').\
price.transform(lambda x : x.std())

This almost works but it gives a missing value on time = 22:

 group price    time early_std
0   A   3.1     20  0.282843
1   A   3.5     21  0.282843
2   A   3.0     22  NaN
3   B   2.3     20  0.141421
4   B   2.1     21  0.141421

I also tried with apply and I think it works, but I need to reset the index, which is something I'd rather avoid (I have a large dataset and I need to do this repeatedly)

early_std2 = df[df.time < 22].groupby('group').price.std()

df.set_index('group', inplace=True)
df['early_std2'] = early_std2

    price   time early_std  early_std2
group               
A   3.1     20  0.282843    0.282843
A   3.5     21  0.282843    0.282843
A   3.0     22  NaN         0.282843
B   2.3     20  0.141421    0.141421
B   2.1     21  0.141421    0.141421

Thanks!

Upvotes: 0

Views: 1586

Answers (2)

Primer
Primer

Reputation: 10302

It looks like you only need to add fillna() to your first code to expand the std values:

df['early_std'] = df[df.time < 22].groupby('group')['price'].transform(pd.Series.std)
df['early_std'] = df.groupby('group')['early_std'].apply(lambda x: x.fillna(x.max()))
df

To get:

  group  price  time  early_std
0     A    3.1    20      0.283
1     A    3.5    21      0.283
2     A    3.0    22      0.283
3     B    2.3    20      0.141
4     B    2.1    21      0.141

EDIT: I have changed ffill to a more general fillna, but you could also use chained .bfill().ffill() to achieve the same result.

Upvotes: 1

burhan
burhan

Reputation: 924

Your second approach is very close to what you are trying to achieve. This may not be the most efficient method but it worked for me:

df['early_std'] = 0
for index,value in early_std2.iteritems():
    df.early_std[df.group==index] = value

Upvotes: 1

Related Questions