sparrow
sparrow

Reputation: 11460

Best way to filter out empty DataFrame cells with Pandas apply

I'm using the apply method to send data to a function from a Pandas DataFrame to a function. If the cell is blank then the object type is "NoneType" or "float" which is incompatible with the string comparisons that my function does. I'm filtering out this data using:

if isinstance(col1,str): #to make sure the data is a string.

My question is if there is a better way to do this since this goes against the concept of duck typing?

For context here is my code:

def check_cols(col1,col2):
    if isinstance(col1,str):
        var = col1
    else:
        var = col2
    #the logical part of the function is here

#passing in data from two columns
dfcat['col3'] = dfcat.apply(lambda x: check_cols(x['col1'],x['col2']),axis=1)

Upvotes: 1

Views: 311

Answers (1)

jezrael
jezrael

Reputation: 862396

I think you can use combine_first if need replace None and NaN:

dfcat['col3'] = dfcat['col1'].combine_first(dfcat['col2'])

But if need replace non strings use mask with boolean mask:

mask = dfcat['col1'].apply(lambda x: isinstance(x,str))
dfcat['col3'] = dfcat['col2'].mask(mask, dfcat['col1'])

Sample:

dfcat = pd.DataFrame({'col1':[np.nan, 'aa', None, 10, 12.7], 'col2':['e','r','t','k', 'g']})
print (dfcat)
   col1 col2
0   NaN    e
1    aa    r
2  None    t
3    10    k
4  12.7    g

mask = dfcat['col1'].apply(lambda x: isinstance(x,str))
dfcat['col3'] = dfcat['col2'].mask(mask, dfcat['col1'])
print (dfcat)
   col1 col2 col3
0   NaN    e    e
1    aa    r   aa
2  None    t    t
3    10    k    k
4  12.7    g    g

Upvotes: 1

Related Questions