Reputation: 213
I have a dataframe where I take a subset of columns and then want to filter out the rows that conditionally match two criterias.
Heres what the dataframe looks like:
Name Err1 Err2 Page
Amazon 404 201 Shopping
Facebook 202 Social
Goku Shopping
Ross 203 Shopping
I replace the nulls with say '-' group the data with Err1 and Err2, and also get the unique count of Err1.
df['err1'].fillna("-", inplace=True)
df['err2'].fillna("-", inplace=True)
df.groupby(["Name","Err1", "Err2"]).agg({"Err1": "count"})
This gives me:
Name Err1 Err2 Err1
Amazon 404 201 1
Facebook 202 - 1
Goku - - 1
Ross - 203 1
a) I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.
b) In the above, how can I get the unique count of both Err1 and Err2 combined, instead of the unique of just Err1?
I dont want to use for loops and iterate through the data as the dataset is over 100k lines. Is there an efficient way to achieve this?
Upvotes: 1
Views: 576
Reputation: 323316
Here is one way first you need to dropna
when Errs are all null
df=df[df[['Err1','Err2']].isnull().all(1)].copy()
About the unique count , when you groupby
with Err1
and Err2
, it already getting the count
by both of them
df.fillna('NaN').groupby(["Name","Err1", "Err2"]).size()
Upvotes: 4