Find maximum occurence value across multiple Column (say Two) in Dataframe

Question

I have created a dataframe (df1), and have column HostCity1 and HostCity2, I would like to know which city across these two column has maximum occurence? In this case it is London, but how to identify and assign it to a object (say city_max)

***import pandas as pd
  olympic_data_list={'HostCity1':['London','Beijing','Athens'],'Year1':[2012,2008,2004],'HostCity2':['London','Sydney','Atlanta'],'Year2':[1948,2000,1996]}
df1=pd.DataFrame(olympic_data_list)
print(df1)***

The output is:

    HostCity1   Year1   HostCity2   Year2
0   London      2012    London      1948
1   Beijing     2008    Sydney      2000
2   Athens      2004    Atlanta     1996

jezrael · Accepted Answer

Use DataFrame.filter for get columns with HostCity, then reshape by DataFrame.stack for Series and count values by Series.value_counts - by default output is sorted, so for top1 value select first index value by indexing [0]:

city_max = df1.filter(like='HostCity').stack().value_counts().index[0]
print (city_max)
London

Details:

print (df1.filter(like='HostCity'))
  HostCity1 HostCity2
0    London    London
1   Beijing    Sydney
2    Athens   Atlanta

print (df1.filter(like='HostCity').stack())
0  HostCity1     London
   HostCity2     London
1  HostCity1    Beijing
   HostCity2     Sydney
2  HostCity1     Athens
   HostCity2    Atlanta
dtype: object

print (df1.filter(like='HostCity').stack().value_counts())
London     2
Beijing    1
Athens     1
Atlanta    1
Sydney     1
dtype: int64

Another solution with DataFrame.melt for unpivot:

city_max = df1.filter(like='HostCity').melt()['value'].value_counts().index[0]

Find maximum occurence value across multiple Column (say Two) in Dataframe

Answers (1)

Related Questions