Reputation: 30605
Lets say I have dataframe with nans in each group like
df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,0,1],'group':[1,1,1,2,2,2,3,3,3]})
and a numpy array like
x = np.array([0,1,2])
Now based on groups how to fill the missing values that are in the numpy array I have i.e
df = pd.DataFrame({'data':[0,1,2,0,1,2,2,0,1],'group':[1,1,1,2,2,2,3,3,3]})
data group 0 0 1 1 1 1 2 2 1 3 0 2 4 1 2 5 2 2 6 2 3 7 0 3 8 1 3
Let me explain a bit of how the data should be filled. Consider the group 2. The values of data
are 0,np.nan,2 . The np.nan
is the missing value from the array [0,1,2]
. So the data to be filled inplace of nan is 1
.
For multiple nan values, take a group for example that has data [np.nan,0,np.nan] now the values to be filled in place of nan are 1 and 2. resulting in [1,0,2]
.
Upvotes: 1
Views: 1789
Reputation: 862691
First find value which miss and then add it to fillna
:
def f(y):
a = list(set(x)-set(y))
a = 1 if len(a) == 0 else a[0]
y = y.fillna(a)
return (y)
df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
data group
0 0 1
1 1 1
2 2 1
3 0 2
4 1 2
5 2 2
6 2 3
7 0 3
8 1 3
EDIT:
df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,np.nan,1, np.nan, np.nan, np.nan],
'group':[1,1,1,2,2,2,3,3,3,4,4,4]})
x = np.array([0,1,2])
print (df)
data group
0 0.0 1
1 1.0 1
2 2.0 1
3 0.0 2
4 NaN 2
5 2.0 2
6 NaN 3
7 NaN 3
8 1.0 3
9 NaN 4
10 NaN 4
11 NaN 4
def f(y):
a = list(set(x)-set(y))
if len(a) == 1:
return y.fillna(a[0])
elif len(a) == 2:
return y.fillna(a[0], limit=1).fillna(a[1])
elif len(a) == 3:
y = pd.Series(x, index=y.index)
return y
else:
return y
df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
data group
0 0 1
1 1 1
2 2 1
3 0 2
4 1 2
5 2 2
6 0 3
7 2 3
8 1 3
9 0 4
10 1 4
11 2 4
Upvotes: 4