Reputation: 4940

How to find the number of not unique rows after groupby()

I have a data frame df with two features: ID_owner, ID_phone, I have to find:

How many people have more than n phones.
Phones shared among more owners, ID_phone having one or more ID_owner.

In order to answer the first question, I have tried:

df.groupby('`ID_owner`')['`ID_phone'].nunique().to_frame()

It seems doesn't work because I need to count the number of duplicates rows per ID_owner after the grouping. I have encountered the same issue in the second question.

I would like to know if exist a specific method or function in pandas for this kind of issues.

The output, for the first question, should be a dataframe with two columns: one showing the ID_owner and the second with the number of smartphones that ID_owner owns.

Upvotes: 0

Answers (2)

id101112

Reputation: 1042

df1.groupby('ID_owner').agg({'ID_phone': 'unique'}).reset_index()

or you can use the following way

df1.groupby('User_owner').apply(lambda x:x.zipcode.unique()).reset_index()

this will give you the output:

    User_owner  zipcode
0   Dave        [34567]
1   Donald      [34353]
2   Jae         [12345]
3   Shankar     [23456, 22222]

but for count you can use, nunique function:

df1.groupby('ID_owner').agg({'ID_phone': 'nunique'}).reset_index().rename(columns = {'zipcode':'count'})

df1.groupby('User_owner').apply(lambda x:x.zipcode.nunique()).reset_index(name ='count')

which will result in

    User_owner  count
0   Dave        1
1   Donald      1
2   Jae         1
3   Shankar     2

Upvotes: 0

kevins_1

Reputation: 1306

It looks like you were slicing your table prematurely though it seems like you want to keep the aggregated table. To answer your first question the following code would work.

n = 2

(df.groupby('ID_owner').agg({'ID_phone': pd.Series.nunique}).query('ID_phone > @n').shape[0]

To answer your second question you can reverse the IDs in the above query, change n, and select the "ID_phone" column.

Upvotes: 1

How to find the number of not unique rows after groupby()

Answers (2)

Related Questions