Reputation: 322
i want to group by id and get three most frequent city. For example i have original dataframe
ID City
1 London
1 London
1 New York
1 London
1 New York
1 Berlin
2 Shanghai
2 Shanghai
and result i want is like this:
ID first_frequent_city second_frequent_city third_frequent_city
1 London New York Berlin
2 Shanghai NaN NaN
Upvotes: 2
Views: 1014
Reputation: 22503
Another way using count
as a reference to sort, then recreate dataframe by looping through groupby
object:
df = (df.assign(count=df.groupby(["ID","City"])["City"].transform("count"))
.drop_duplicates(["ID","City"])
.sort_values(["ID","count"], ascending=False))
print (pd.DataFrame([i["City"].unique()[:3] for _, i in df.groupby("ID")]).fillna(np.NaN))
0 1 2
0 London New York Berlin
1 Shanghai NaN NaN
Upvotes: 2
Reputation: 862671
First step is use SeriesGroupBy.value_counts
for count values of City
per ID
, advantage is already values are sorted, then get counter by GroupBy.cumcount
, filter first 3
values by loc
, pivoting by DataFrame.pivot
, change columns names and last convert ID
to column by DataFrame.reset_index
:
df = (df.groupby('ID')['City'].value_counts()
.groupby(level=0).cumcount()
.loc[lambda x: x < 3]
.reset_index(name='c')
.pivot('ID','c','City')
.rename(columns={0:'first_', 1:'second_', 2:'third_'})
.add_suffix('frequent_city')
.rename_axis(None, axis=1)
.reset_index())
print (df)
ID first_frequent_city second_frequent_city third_frequent_city
0 1 London New York Berlin
1 2 Shanghai NaN NaN
Upvotes: 4
Reputation: 16683
Get the .count
by ID
and City
and then use np.where()
with .groupby()
with max
, median
and min
. Then set the index and unstack rows to columns on the max
column.
df = df.assign(count=df.groupby(['ID', 'City'])['City'].transform('count')).drop_duplicates()
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('min')), 'third_frequent_city', np.nan)
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('median')), 'second_frequent_city', df['max'])
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('max')), 'first_frequent_city', df['max'])
df = df.drop('count',axis=1).set_index(['ID', 'max']).unstack(1)
output:
City
max first_frequent_city second_frequent_city third_frequent_city
ID
1 London New York Berlin
2 Shanghai NaN NaN
Upvotes: 1
Reputation: 28644
A bit long, essentially you groupby twice, first part works on the idea that grouping sorts the data in ascending order, the second part allows us to split the data into individual columns :
(df
.groupby("ID")
.tail(3)
.drop_duplicates()
.groupby("ID")
.agg(",".join)
.City.str.split(",", expand=True)
.set_axis(["first_frequent_city",
"second_frequent_city",
third_frequent_city"],
axis="columns",)
)
first_frequent_city second_frequent_city third_frequent_city
ID
1 London New York Berlin
2 Shanghai None None
Upvotes: 1