Leandro Baruch
Leandro Baruch

Reputation: 117

Creating a second dataframe considering 2 conditions from first dataframe

I have a main DataFrame and I have found some rows that I dont want. I have found those conditions in the code below:

df.query("group == 'treatment' and landing_page != 'new_page'") 
df.query("landing_page == 'new_page' and group != 'treatment'")

Now I want a df2 considering the entire df EXCEPT those rows given in the code above. I am getting a hard time trying to create this df2. Any lights?

My actual code:

df2 = df.query("group == 'treatment' and landing_page == 'new_page'") and df.query("group == 'control' and landing_page == 'old_page'")

I am receiving this error: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

Views: 40

Answers (1)

cs95
cs95

Reputation: 402493

Change query to eval, and invert the mask when indexing df.

m1 = df.eval("group == 'treatment' and landing_page != 'new_page'") 
m2 = df.eval("landing_page == 'new_page' and group != 'treatment'")

df_out = df[~(m1 | m2)]

Or, a little more generically,

stmts = [
    "group == 'treatment' and landing_page != 'new_page'",
    "landing_page == 'new_page' and group != 'treatment'"
]

df_out = df[~np.logical_or.reduce([df.eval(stmt) for stmt in stmts])]

Upvotes: 1

Related Questions