Rank multiple string columns using Pandas

Question

The data frame I am working with has three columns named list1, list2 and list3 based on three separate lists of "best places to live".

Desired output:

I want to return another column, series or groupby which shows the overall rank for each city after it takes into account position across all lists so Edingburgh would top the list and others would follow based on how close they are in terms of ranking to the top of each column. To clarify, Edingburgh is ranked 1st in list2 and list3.

It will look something like this:

1 Edingburgh 
2 Hart 
3 Orkney, London, Solihull 
4 Rutland, Bristol Hertfordshire 
5 Wychavon, Newcastle, Northumberland

Basically I want to see the overall ranking for each city when all lists have been taken into account and learn how this can be achieved with Pandas.

What have I tried?

I was hoping there would be a simple way to rank using something like places2live.rank() but do not see how I could use this with string values.

Data

    list1      list2        list3
0   Hart       Edinburgh    Edinburgh
1   Orkney     London       Solihull
2   Rutland    Bristol      Hertfordshire
3   Wychavon   Newcastle    Northumberland
4   Winchester Manchester   South Lanarkshire
5   Wokingham  Glasgow      Berkshire
6   Waverley   Leeds        Darlington
7   Craven     Cardiff      North Lanarkshire

Chris Adams · Accepted Answer

Here is one approach:

cities = pd.Index(np.unique(df.values))
ranks = pd.Series([1] * len(cities), index=cities)

for column in df:
    ranks = ((ranks + df.reset_index().set_index(column)['index'])/2).fillna(ranks)

city_ranks = ranks.reset_index().groupby(0)['index'].apply(list).reset_index(drop=True)
city_ranks.index += 1
print(city_ranks)

[output]

1                                    [Edinburgh]
2                                         [Hart]
3                     [London, Orkney, Solihull]
4              [Bristol, Hertfordshire, Rutland]
5          [Newcastle, Northumberland, Wychavon]
6    [Manchester, South Lanarkshire, Winchester]
7                [Berkshire, Glasgow, Wokingham]
8                  [Darlington, Leeds, Waverley]
9           [Cardiff, Craven, North Lanarkshire]

Rank multiple string columns using Pandas

Answers (1)

Related Questions