rpb
rpb

Reputation: 3299

Fill Nan based on multiple column condition in Pandas

The objective is to fill NaN with respect to two columns (i.e., a, b) .

a b c d
2,0,1,4
5,0,5,6
6,0,1,1
1,1,1,4
4,1,5,6
5,1,5,6
6,1,1,1
1,2,2,3
6,2,5,6

Such that, there should be continous value of between 1 to 6 for the column a for a fixed value in column b. Then, the other rows assigned to nan.

The code snippet does the trick

import numpy as np
import pandas as pd



maxval_col_a=6
lowval_col_a=1

maxval_col_b=2
lowval_col_b=0
r=list(range(lowval_col_b,maxval_col_b+1))
df=pd.DataFrame(np.column_stack([[2,5,6,1,4,5,6,1,6,],
  [0,0,0,1,1,1,1,2,2,], [1,5,1,1,5,5,1,2,5,],[4,6,1,4,6,6,1,3,6,]]),columns=['a','b','c','d'])

all_df=[]
for idx in r:
  k=df.loc[df['b']==idx].set_index('a').reindex(range(lowval_col_a, maxval_col_a+1, 1)).reset_index()
  k['b']=idx
  all_df.append(k)


df=pd.concat(all_df)

But, I am curious whether there are more efficient and better way of doing this with Pandas.

The expected output

   a  b    c    d
0  1  0  NaN  NaN
1  2  0  1.0  4.0
2  3  0  NaN  NaN
3  4  0  NaN  NaN
4  5  0  5.0  6.0
5  6  0  1.0  1.0
0  1  1  1.0  4.0
1  2  1  NaN  NaN
2  3  1  NaN  NaN
3  4  1  5.0  6.0
4  5  1  5.0  6.0
5  6  1  1.0  1.0
0  1  2  2.0  3.0
1  2  2  NaN  NaN
2  3  2  NaN  NaN
3  4  2  NaN  NaN
4  5  2  NaN  NaN
5  6  2  5.0  6.0

Upvotes: 2

Views: 288

Answers (3)

tlentali
tlentali

Reputation: 3455

We can do it by using a groupby on the column b, then set a as index and add the missing values of a using numpy.arange.
To finish, reset the index to get the expected result :

import numpy as np

df.groupby('b').apply(lambda x : x.set_index('a').reindex(np.arange(1, 7))).drop('b', 1).reset_index()

Output :


    b   a   c   d
0   0   1   NaN NaN
1   0   2   1.0 4.0
2   0   3   NaN NaN
3   0   4   NaN NaN
4   0   5   5.0 6.0
5   0   6   1.0 1.0
6   1   1   1.0 4.0
7   1   2   NaN NaN
8   1   3   NaN NaN
9   1   4   5.0 6.0
10  1   5   5.0 6.0
11  1   6   1.0 1.0
12  2   1   2.0 3.0
13  2   2   NaN NaN
14  2   3   NaN NaN
15  2   4   NaN NaN
16  2   5   NaN NaN
17  2   6   5.0 6.0

Upvotes: 0

Gabriele
Gabriele

Reputation: 343

First create a multindex with cols [a,b] then a new multindex with all the combinations and then you reindex with the new multindex: (showing all steps)

# set both a and b as index (it's a multiindex)
df.set_index(['a','b'],drop=True,inplace=True)
# create the new multindex
new_idx_a=np.tile(np.arange(0,6+1),3)
new_idx_b=np.repeat([0,1,2],6+1)

new_multidx=pd.MultiIndex.from_arrays([new_idx_a,
                            new_idx_b])
# reindex
df=df.reindex(new_multidx)
# convert the multindex back to columns
df.index.names=['a','b']
df.reset_index()

results:

    a  b    c    d
0   0  0  NaN  NaN
1   1  0  NaN  NaN
2   2  0  1.0  4.0
3   3  0  NaN  NaN
4   4  0  NaN  NaN
5   5  0  5.0  6.0
6   6  0  1.0  1.0
7   0  1  NaN  NaN
8   1  1  1.0  4.0
9   2  1  NaN  NaN
10  3  1  NaN  NaN
11  4  1  5.0  6.0
12  5  1  5.0  6.0
13  6  1  1.0  1.0
14  0  2  NaN  NaN
15  1  2  2.0  3.0
16  2  2  NaN  NaN
17  3  2  NaN  NaN
18  4  2  NaN  NaN
19  5  2  NaN  NaN
20  6  2  5.0  6.0

Upvotes: 0

Corralien
Corralien

Reputation: 120401

Create the cartesian product of combinations:

mi = pd.MultiIndex.from_product([df['b'].unique(), range(1, 7)],
                                names=['b', 'a']).swaplevel()

out = df.set_index(['a', 'b']).reindex(mi).reset_index()
print(out)

# Output
    a  b    c    d
0   1  0  NaN  NaN
1   2  0  1.0  4.0
2   3  0  NaN  NaN
3   4  0  NaN  NaN
4   5  0  5.0  6.0
5   6  0  1.0  1.0
6   1  1  1.0  4.0
7   2  1  NaN  NaN
8   3  1  NaN  NaN
9   4  1  5.0  6.0
10  5  1  5.0  6.0
11  6  1  1.0  1.0
12  1  2  2.0  3.0
13  2  2  NaN  NaN
14  3  2  NaN  NaN
15  4  2  NaN  NaN
16  5  2  NaN  NaN
17  6  2  5.0  6.0

Upvotes: 1

Related Questions