Reputation: 181
I have created a dataframe (df1), and have column HostCity1
and HostCity2
, I would like to know which city across these two column has maximum occurence? In this case it is London, but how to identify and assign it to a object (say city_max)
***import pandas as pd
olympic_data_list={'HostCity1':['London','Beijing','Athens'],'Year1':[2012,2008,2004],'HostCity2':['London','Sydney','Atlanta'],'Year2':[1948,2000,1996]}
df1=pd.DataFrame(olympic_data_list)
print(df1)***
The output is:
HostCity1 Year1 HostCity2 Year2
0 London 2012 London 1948
1 Beijing 2008 Sydney 2000
2 Athens 2004 Atlanta 1996
Upvotes: 1
Views: 51
Reputation: 862671
Use DataFrame.filter
for get columns with HostCity
, then reshape by DataFrame.stack
for Series
and count values by Series.value_counts
- by default output is sorted, so for top1 value select first index value by indexing [0]
:
city_max = df1.filter(like='HostCity').stack().value_counts().index[0]
print (city_max)
London
Details:
print (df1.filter(like='HostCity'))
HostCity1 HostCity2
0 London London
1 Beijing Sydney
2 Athens Atlanta
print (df1.filter(like='HostCity').stack())
0 HostCity1 London
HostCity2 London
1 HostCity1 Beijing
HostCity2 Sydney
2 HostCity1 Athens
HostCity2 Atlanta
dtype: object
print (df1.filter(like='HostCity').stack().value_counts())
London 2
Beijing 1
Athens 1
Atlanta 1
Sydney 1
dtype: int64
Another solution with DataFrame.melt
for unpivot:
city_max = df1.filter(like='HostCity').melt()['value'].value_counts().index[0]
Upvotes: 1