Reputation: 10513
Suppose we've got a test
dataset:
value group
123 1
120 1
NA 1
130 1
23 2
22 2
24 2
NA 2
Now we want to replace missing values with group
-wise median values. In R
we can do it using a nested ifelse
call.
first.med <- median(test[test$group == 1, ]$value, na.rm = T)
second.med <- median(test[test$group == 2, ]$value, na.rm = T)
test$value <- ifelse(is.na(test$value) & test$group == 1, first.med
ifelse(is.na(test$value) & test$group == 2, second.med,
test$value))
I though about applying the numpy.where
function or the pandas.DataFrame.Set.map
method as showcased here, but both techniques do not support nesting. I can think of a list comprehension to do this, but I wish to know if there is an alternative in the realm of NumPy/pandas. Thank you in advance.
Upvotes: 2
Views: 1457
Reputation: 5414
df = pd.DataFrame({'value' : [123,120,np.nan ,130,23 ,22 ,24 ,np.nan] , 'group' : [1 , 1 ,1 , 1 , 2 , 2 , 2 , 2] })
def replace_with_median(df):
df['value'][pd.isnull(df['value'])] = df['value'].median()
return df
df.groupby('group').apply(replace_with_median)
Upvotes: 1
Reputation: 52276
In this case, you can use a groupby
to fill by the group median:
In [16]: df.groupby('group')['value'].apply(lambda x: x.fillna(x.median()))
Out[16]:
0 123
1 120
2 123
3 130
4 23
5 22
6 24
7 23
dtype: float64
Although in general, both of those methods can be nested just fine. E.g., you could do:
In [23]: medians = df.groupby('group')['value'].median()
In [24]: np.where(pd.isnull(df['value']),
np.where(df['group'] == 1, medians.loc[1], medians.loc[2]),
df['value'])
Out[24]: array([ 123., 120., 123., 130., 23., 22., 24., 23.])
Upvotes: 3