Reputation: 580

Easiest way to count distinct number of rows in Pandas dataframe?

I just did:

len(my_df.drop_duplicates())

Is there not a more elegant way to do this? in R you can do:

nrow(distinct(my_df))

Which to me is very readable, drop_duplicates() feels worrying, because as new Python user, I get lost with what operations are happening in place and which ones you need to store/overwrite copies of for the environment to persist the change.

The fact that searching on google didn't give me a clear one click answer for what I'd think was a simple function worried me a bit...

Thanks!

Upvotes: 0

Answers (2)

BENY

Reputation: 323396

In pandas you can do by another way groupby or duplicated with sum

df.groupby(list(df)).ngroup() 

(~df.duplicated()).sum()

Also as a R and python user, I know that is hard to switch from R to pandas , but the most common way is drop_duplicates

Upvotes: 1

Gokturk Sahin

Reputation: 91

len(pd.unique(my_df))

you are looking for unique I guess.

Upvotes: 0

Easiest way to count distinct number of rows in Pandas dataframe?

Answers (2)

Related Questions