Reputation: 580
I just did:
len(my_df.drop_duplicates())
Is there not a more elegant way to do this? in R you can do:
nrow(distinct(my_df))
Which to me is very readable, drop_duplicates() feels worrying, because as new Python user, I get lost with what operations are happening in place and which ones you need to store/overwrite copies of for the environment to persist the change.
The fact that searching on google didn't give me a clear one click answer for what I'd think was a simple function worried me a bit...
Thanks!
Upvotes: 0
Views: 60
Reputation: 323396
In pandas
you can do by another way groupby
or duplicated
with sum
df.groupby(list(df)).ngroup()
(~df.duplicated()).sum()
Also as a R
and python
user, I know that is hard to switch from R
to pandas
, but the most common way is drop_duplicates
Upvotes: 1