Reputation: 4940
I have a data frame df
with two features: ID_owner
, ID_phone
, I have to find:
n
phones.ID_phone
having one or more ID_owner
.In order to answer the first question, I have tried:
df.groupby('`ID_owner`')['`ID_phone'].nunique().to_frame()
It seems doesn't work because I need to count the number of duplicates rows per ID_owner
after the grouping. I have encountered the same issue in the second question.
I would like to know if exist a specific method or function in pandas
for this kind of issues.
The output, for the first question, should be a dataframe with two columns: one showing the ID_owner
and the second with the number of smartphones that ID_owner
owns.
Upvotes: 0
Views: 112
Reputation: 1042
df1.groupby('ID_owner').agg({'ID_phone': 'unique'}).reset_index()
or you can use the following way
df1.groupby('User_owner').apply(lambda x:x.zipcode.unique()).reset_index()
this will give you the output:
User_owner zipcode
0 Dave [34567]
1 Donald [34353]
2 Jae [12345]
3 Shankar [23456, 22222]
but for count you can use, nunique function:
df1.groupby('ID_owner').agg({'ID_phone': 'nunique'}).reset_index().rename(columns = {'zipcode':'count'})
or
df1.groupby('User_owner').apply(lambda x:x.zipcode.nunique()).reset_index(name ='count')
which will result in
User_owner count
0 Dave 1
1 Donald 1
2 Jae 1
3 Shankar 2
Upvotes: 0
Reputation: 1306
It looks like you were slicing your table prematurely though it seems like you want to keep the aggregated table. To answer your first question the following code would work.
n = 2
(df.groupby('ID_owner').agg({'ID_phone': pd.Series.nunique}).query('ID_phone > @n').shape[0]
To answer your second question you can reverse the IDs in the above query, change n, and select the "ID_phone" column.
Upvotes: 1