sidman
sidman

Reputation: 213

Pandas - replace row with null values and drop rows matching two conditions

I have a dataframe where I take a subset of columns and then want to filter out the rows that conditionally match two criterias.

Heres what the dataframe looks like:

Name     Err1    Err2    Page 
Amazon   404     201     Shopping
Facebook 202             Social
Goku                     Shopping
Ross             203     Shopping

I replace the nulls with say '-' group the data with Err1 and Err2, and also get the unique count of Err1.

    df['err1'].fillna("-", inplace=True)
    df['err2'].fillna("-", inplace=True)
    df.groupby(["Name","Err1", "Err2"]).agg({"Err1": "count"})

This gives me:

Name     Err1    Err2    Err1 
Amazon   404     201     1
Facebook 202      -      1
Goku      -       -      1
Ross      -      203     1

a) I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.
b) In the above, how can I get the unique count of both Err1 and Err2 combined, instead of the unique of just Err1?

I dont want to use for loops and iterate through the data as the dataset is over 100k lines. Is there an efficient way to achieve this?

Upvotes: 1

Views: 576

Answers (1)

BENY
BENY

Reputation: 323316

Here is one way first you need to dropna when Errs are all null

df=df[df[['Err1','Err2']].isnull().all(1)].copy()

About the unique count , when you groupby with Err1 and Err2, it already getting the count by both of them

df.fillna('NaN').groupby(["Name","Err1", "Err2"]).size()

Upvotes: 4

Related Questions