A_Dog
A_Dog

Reputation: 13

pandas syntax examples confusion

I am confused by some of the examples I see for pandas. For example this is shortened from a post I recently read:

df[df.duplicated()|df()]

What I don't understand is why df needs to be on the outside: df[df.duplicated()] vs just using df.duplicated(). In the documentation I have not yet seen the first example, everything is presented in the format df.something_doing(). But I see many examples such as df[df.something_doing()] and I don't understand what the df on the outside does.

Upvotes: 1

Views: 86

Answers (1)

Vaishali
Vaishali

Reputation: 38415

df.duplicated() returns the boolean values. They provide a mask with True if the condition mentioned is satisfied, False otherwise. If you want a slice of the dataframe based on the boolean mask, you need:

df[df.duplicated()]

Another simple example, consider this dataframe

    col1  id
0   1     a
1   0     a
2   1     a
3   1     b

If you only want the columns where 'id' is 'a',

df.id == 'a'

would give you boolean mask but

df[df.id == 'a']

would return the dataframe

    col1   id
0   1      a
1   0      a
2   1      a

Upvotes: 2

Related Questions