Reputation: 5498
The data frame I am working with has three columns named list1
, list2
and list3
based on three separate lists of "best places to live".
Desired output:
I want to return another column, series or groupby which shows the overall rank for each city after it takes into account position across all lists so Edingburgh
would top the list and others would follow based on how close they are in terms of ranking to the top of each column. To clarify, Edingburgh is ranked 1st in list2
and list3
.
It will look something like this:
1 Edingburgh
2 Hart
3 Orkney, London, Solihull
4 Rutland, Bristol Hertfordshire
5 Wychavon, Newcastle, Northumberland
Basically I want to see the overall ranking for each city when all lists have been taken into account and learn how this can be achieved with Pandas.
What have I tried?
I was hoping there would be a simple way to rank using something like places2live.rank()
but do not see how I could use this with string values.
Data
list1 list2 list3
0 Hart Edinburgh Edinburgh
1 Orkney London Solihull
2 Rutland Bristol Hertfordshire
3 Wychavon Newcastle Northumberland
4 Winchester Manchester South Lanarkshire
5 Wokingham Glasgow Berkshire
6 Waverley Leeds Darlington
7 Craven Cardiff North Lanarkshire
Upvotes: 1
Views: 868
Reputation: 18647
Here is one approach:
cities = pd.Index(np.unique(df.values))
ranks = pd.Series([1] * len(cities), index=cities)
for column in df:
ranks = ((ranks + df.reset_index().set_index(column)['index'])/2).fillna(ranks)
city_ranks = ranks.reset_index().groupby(0)['index'].apply(list).reset_index(drop=True)
city_ranks.index += 1
print(city_ranks)
[output]
1 [Edinburgh]
2 [Hart]
3 [London, Orkney, Solihull]
4 [Bristol, Hertfordshire, Rutland]
5 [Newcastle, Northumberland, Wychavon]
6 [Manchester, South Lanarkshire, Winchester]
7 [Berkshire, Glasgow, Wokingham]
8 [Darlington, Leeds, Waverley]
9 [Cardiff, Craven, North Lanarkshire]
Upvotes: 1