Deleting a row in pandas dataframe based on condition

Question

Scenario: I have a dataframe with some nan scattered around. It has multiple columns, the ones of interest are "bid" and "ask"

What I want to do: I want to remove all rows where the bid column value is nan AND the ask column value is nan.

Question: What is the best way to do it?

What I already tried:

ab_df = ab_df[ab_df.bid != 'nan' and ab_df.ask != 'nan']

ab_df = ab_df[ab_df.bid.empty and ab_df.ask.empty] 

ab_df = ab_df[ab_df.bid.notnull and ab_df.ask.notnull]

But none of them work.

akuiper · Accepted Answer

You need vectorized logical operators & or | (and and or from python are to compare scalars not for pandas Series), to check nan values, you can use isnull and notnull:

To remove all rows where the bid column value is nan AND the ask column value is nan, keep the opposite:

ab_df[ab_df.bid.notnull() | ab_df.ask.notnull()]

Example:

df = pd.DataFrame({
        "bid": [pd.np.nan, 1, 2, pd.np.nan],
        "ask": [pd.np.nan, pd.np.nan, 2, 1]
    })

df[df.bid.notnull() | df.ask.notnull()]

#   ask bid
#1  NaN 1.0
#2  2.0 2.0
#3  1.0 NaN

If you need both columns to be non missing:

df[df.bid.notnull() & df.ask.notnull()]

#   ask bid
#2  2.0 2.0

Another option using dropna by setting the thresh parameter:

df.dropna(subset=['ask', 'bid'], thresh=1)

#   ask bid
#1  NaN 1.0
#2  2.0 2.0
#3  1.0 NaN

df.dropna(subset=['ask', 'bid'], thresh=2)

#   ask bid
#2  2.0 2.0

Deleting a row in pandas dataframe based on condition

Answers (2)

Related Questions