Reputation: 13
I am confused by some of the examples I see for pandas. For example this is shortened from a post I recently read:
df[df.duplicated()|df()]
What I don't understand is why df
needs to be on the outside: df[df.duplicated()]
vs just using df.duplicated()
. In the documentation I have not yet seen the first example, everything is presented in the format df.something_doing()
. But I see many examples such as df[df.something_doing()]
and I don't understand what the df
on the outside does.
Upvotes: 1
Views: 86
Reputation: 38415
df.duplicated() returns the boolean values. They provide a mask with True if the condition mentioned is satisfied, False otherwise. If you want a slice of the dataframe based on the boolean mask, you need:
df[df.duplicated()]
Another simple example, consider this dataframe
col1 id
0 1 a
1 0 a
2 1 a
3 1 b
If you only want the columns where 'id' is 'a',
df.id == 'a'
would give you boolean mask but
df[df.id == 'a']
would return the dataframe
col1 id
0 1 a
1 0 a
2 1 a
Upvotes: 2