pandas fill cells based on succession of other cells

Question

I have a df with lots of missing data but essentially the same columns (originating from merging data sets). As an example, consider the following:

temp = pd.DataFrame({"fruit_1": ["apple", "pear", "don't want to tell", np.nan, np.nan, np.nan],
                     "fruit_2": [np.nan, np.nan, "don't want to tell", "apple", "don't want to tell", np.nan],
                     "fruit_3": ["apple", np.nan, "pear", "don't want to tell", np.nan, "pear"]})

I now want to merge them into one column; conflicts should be resolved as follows:

np.nan always overridden with other information
"don't want to tell" only overrides np.nan
any other value overrides only np.nan and "don't want to tell" (i.e. keep first value).

I have tried creating a new column and using apply (see below).

temp.insert(0, "fruit", np.nan)
temp['fruit'].apply(lambda row: row["fruit"] if np.isnan(row["fruit"]) and not np.isnan(row["fruit_1"]) else np.nan) # map col

The code, however, produces a TypeError: 'float' object is not subscriptable

Can someone tell me whether (1) this is a feasible approach in general - and if so, what my mistake is? And (2) what would be the most efficient way to do this?

Thanks a lot in advance.

** EDIT ** The expected output is

                fruit             
0               apple         
1                pear       
2                pear  
3               apple             
4  don't want to tell
5                pear

BENY · Accepted Answer

With ffill and additional np.where

s=temp.mask(temp=="don't want to tell").bfill(1).iloc[:,0]
s=np.where((temp=="don't want to tell").any(1)&s.isnull(),"don't want to tell",s)
s
Out[17]: 
array(['apple', 'pear', 'pear', 'apple', "don't want to tell", 'pear'],
      dtype=object)
temp['New']=s
temp
Out[19]: 
              fruit_1  ...                 New
0               apple  ...               apple
1                pear  ...                pear
2  don't want to tell  ...                pear
3                 NaN  ...               apple
4                 NaN  ...  don't want to tell
5                 NaN  ...                pear
[6 rows x 4 columns]

pandas fill cells based on succession of other cells

Answers (1)

Related Questions