Bharath M Shetty
Bharath M Shetty

Reputation: 30605

How to fill values based on data present in column and an array? Pandas

Lets say I have dataframe with nans in each group like

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,0,1],'group':[1,1,1,2,2,2,3,3,3]})

and a numpy array like

x = np.array([0,1,2])

Now based on groups how to fill the missing values that are in the numpy array I have i.e

df = pd.DataFrame({'data':[0,1,2,0,1,2,2,0,1],'group':[1,1,1,2,2,2,3,3,3]})
      data   group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

Let me explain a bit of how the data should be filled. Consider the group 2. The values of data are 0,np.nan,2 . The np.nan is the missing value from the array [0,1,2]. So the data to be filled inplace of nan is 1.

For multiple nan values, take a group for example that has data [np.nan,0,np.nan] now the values to be filled in place of nan are 1 and 2. resulting in [1,0,2].

Upvotes: 1

Views: 1789

Answers (1)

jezrael
jezrael

Reputation: 862691

First find value which miss and then add it to fillna:

def f(y):
    a = list(set(x)-set(y))
    a = 1 if len(a) == 0 else a[0]
    y = y.fillna(a)
    return (y)

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
   data  group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

EDIT:

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,np.nan,1, np.nan, np.nan, np.nan],
                   'group':[1,1,1,2,2,2,3,3,3,4,4,4]})
x = np.array([0,1,2])
print (df)
    data  group
0    0.0      1
1    1.0      1
2    2.0      1
3    0.0      2
4    NaN      2
5    2.0      2
6    NaN      3
7    NaN      3
8    1.0      3
9    NaN      4
10   NaN      4
11   NaN      4

def f(y):
    a = list(set(x)-set(y))
    if len(a) == 1:
        return y.fillna(a[0])
    elif len(a) == 2:
        return y.fillna(a[0], limit=1).fillna(a[1])
    elif len(a) == 3:
        y = pd.Series(x, index=y.index)
        return y
    else:
        return y

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
    data  group
0      0      1
1      1      1
2      2      1
3      0      2
4      1      2
5      2      2
6      0      3
7      2      3
8      1      3
9      0      4
10     1      4
11     2      4

Upvotes: 4

Related Questions