yoopiyo
yoopiyo

Reputation: 187

How to plot top ten for each column python

I want to plot the top 10 countries (displayed in rows index) according to the values taken for each column.

The columns are the features with which I evaluate the countries : "feature1","feature2","feature3","feature4","feature5","feature6","feature7","feature8","feature9","feature10". So this would be 10 graphs with top 10 ranking. Then I want the "global" top 10 that is taking into account all columns (let's say each column has the same coefficient).

I was thinking about making a new df that shows the most recurring country in these top 10 dfs (the country that appears the most in the "top 10") but don't know how to.

I am struggling so I started by creating new dataframes from the original large dataset named "data_etude" which I made a copy of "date_etude_copy". For each new dataframe "data_ind" I added a new column, to show top 10 based on each feature/column I am analysing (The columns are the features and the rows are the values taken by the countries).

Then I wrote a script to create from these dataframes another dataframe that shows only the top 10 ranking, the values and prameter. I am aware that this is quite laborious and as a beginner I didn't manage to make a loop from this... the original dataset: enter image description here

    data_etude_copy = data_etude.copy()
        

dataframes of top 10 countries for each feature (but should do a loop this is so laborious) :

            data_ind1 = data_etude_copy.sort_values(by=['feature1'], ascending=False).head(10)

            data_ind2 = data_etude_copy.sort_values(by=['feature2'], ascending=False).head(10)

            data_ind3 = data_etude_copy.sort_values(by=['feature3'], ascending=False).head(10)
            
            data_ind4 = data_etude_copy.sort_values(by=['feature4'], ascending=False).head(10)
            
            data_ind5 = data_etude_copy.sort_values(by=['feature5'], ascending=False).head(10)
            
            data_ind6 = data_etude_copy.sort_values(by=['feature6'], ascending=False).head(10)
            
            data_ind7 = data_etude_copy.sort_values(by=['feature7'], ascending=False).head(10)
            
            data_ind8 = data_etude_copy.sort_values(by=['feature8'], ascending=False).head(10)
            
            data_ind9 = data_etude_copy.sort_values(by=['feature9'], ascending=False).head(10)
            
            data_ind10 = data_etude_copy.sort_values(by=['feature10'], ascending=False).head(10)

and simplified dfs with top 10 for each feature (i need a loop I know...)

data_ind1.drop(data_ind1.loc[:,data_ind1.columns!="feature1"], inplace=True, axis = 1)            

data_ind2.drop(data_ind2.loc[:,data_ind2.columns!="feature2"], inplace=True, axis = 1)
           
data_ind3.drop(data_ind3.loc[:,data_ind3.columns!="feature3"], inplace=True, axis = 1)
    
            
data_ind4.drop(data_ind4.loc[:,data_ind4.columns!="feature4"], inplace=True, axis = 1)
    

data_ind5.drop(data_ind5.loc[:,data_ind5.columns!="feature5"], inplace=True, axis = 1)
        
                
data_ind6.drop(data_ind6.loc[:,data_ind6.columns!="feature6"], inplace=True, axis = 1)
    
            
data_ind7.drop(data_ind7.loc[:,data_ind7.columns!="feature7"], inplace=True, axis = 1)
    
            
data_ind8.drop(data_ind8.loc[:,data_ind8.columns!="feature8"], inplace=True, axis = 1)
    
            
data_ind9.drop(data_ind9.loc[:,data_ind9.columns!="feature9"], inplace=True, axis = 1)

data_ind10.drop(data_ind3.loc[:,data_ind10.columns!="feature10"], inplace=True, axis = 1)

How could I make this into a loop and plot the aimed result? That is to say:

-plotting top 10 countries for each features

-then a final "top 10 countries" taking into account all 10 features (eather with countries that appears the most in each df or countries with best ranking if all features have same coefficient value)?

Upvotes: 0

Views: 2180

Answers (1)

Jackson V
Jackson V

Reputation: 76

I think this is what you're asking for? I put your code into a for loop form and added code for ranking the countries overall. The overall ranking is based on all features, not just the top 10 lists but if you'd like it the other way then just switch the order of the commented blocks in the first for loop. I also wasn't sure how you wanted to display it so currently it just prints the final dataframe. It's probably not the cleanest code ever but I hope it helps!

import pandas as pd
import numpy as np

data = np.random.randint(100,size=(12,10))

countries = [
    'Country1',
    'Country2',
    'Country3',
    'Country4',
    'Country5',
    'Country6',
    'Country7',
    'Country8',
    'Country9',
    'Country10',
    'Country11',
    'Country12',
]
feature_names_weights = {
    'feature1'  :1.0,
    'feature2'  :1.0,
    'feature3'  :1.0,
    'feature4'  :1.0,
    'feature5'  :1.0,
    'feature6'  :1.0,
    'feature7'  :1.0,
    'feature8'  :1.0,
    'feature9'  :1.0,
    'feature10' :1.0,
}
feature_names = list(feature_names_weights.keys())

df = pd.DataFrame(data=data, index=countries, columns=feature_names)
data_etude_copy = df

data_sorted_by_feature = {}
country_scores = (pd.DataFrame(data=np.zeros(len(countries)),index=countries))[0]

for feature in feature_names:
    #Adds to each country's score and multiplies by weight factor for each feature
    for country in countries:
        country_scores[country] += data_etude_copy[feature][country]*(feature_names_weights[feature])
    #Sorts the countries by feature (your code in loop form)
    data_sorted_by_feature[feature] = data_etude_copy.sort_values(by=[feature], ascending=False).head(10)
    data_sorted_by_feature[feature].drop(data_sorted_by_feature[feature].loc[:,data_sorted_by_feature[feature].columns!=feature], inplace=True, axis = 1)

#sort country total scores
ranked_countries = country_scores.sort_values(ascending=False).head(10)

##Put everything into one DataFrame
#Create empty DataFrame
empty_data=np.empty((10,11),str)
outputDF = pd.DataFrame(data=empty_data,columns=((feature_names)+['Overall']))
#Add entries for all features
for feature in feature_names:
    for index in range(10):
        country = list(data_sorted_by_feature[feature].index)[index]
        outputDF[feature][index] = f'{country}: {data_sorted_by_feature[feature][feature][country]}'
#Add column for overall country score
for index in range(10):
    country = list(ranked_countries.index)[index]
    outputDF['Overall'][index] = f'{country}: {ranked_countries[country]}'

#Print DataFrame
print(outputDF)

Example data in:

           feature1  feature2  feature3  feature4  feature5  feature6  feature7  feature8  feature9  feature10
Country1         40        31         5         6         4        67        65        57        52         96
Country2         93        20        41        65        44        21        91        25        43         75
Country3         93        34        87        69         0        25        65        71        17         91
Country4         24        20        41        68        46         1        94        87        11         97
Country5         90        21        93         0        72        20        44        87        16         42
Country6         93        17        33        40        96        53         1        97        51         20
Country7         82        50        34        27        44        38        49        85         7         70
Country8         33        81        14         5        72        13        13        53        39         47
Country9         18        38        20        32        52        96        51        93        53         16
Country10        75        94        91        59        39        24         7         0        96         57
Country11        62         9        33        89         5        77        37        63        42         29
Country12         7        98        43        71        98        81        48        13        61         69

Corresponding output:

        feature1       feature2       feature3       feature4       feature5       feature6       feature7       feature8       feature9      feature10           Overall
0   Country2: 93  Country12: 98   Country5: 93  Country11: 89  Country12: 98   Country9: 96   Country4: 94   Country6: 97  Country10: 96   Country4: 97  Country12: 589.0
1   Country3: 93  Country10: 94  Country10: 91  Country12: 71   Country6: 96  Country12: 81   Country2: 91   Country9: 93  Country12: 61   Country1: 96   Country3: 552.0
2   Country6: 93   Country8: 81   Country3: 87   Country3: 69   Country5: 72  Country11: 77   Country1: 65   Country4: 87   Country9: 53   Country3: 91  Country10: 542.0
3   Country5: 90   Country7: 50  Country12: 43   Country4: 68   Country8: 72   Country1: 67   Country3: 65   Country5: 87   Country1: 52   Country2: 75   Country2: 518.0
4   Country7: 82   Country9: 38   Country2: 41   Country2: 65   Country9: 52   Country6: 53   Country9: 51   Country7: 85   Country6: 51   Country7: 70   Country6: 501.0
5  Country10: 75   Country3: 34   Country4: 41  Country10: 59   Country4: 46   Country7: 38   Country7: 49   Country3: 71   Country2: 43  Country12: 69   Country4: 489.0
6  Country11: 62   Country1: 31   Country7: 34   Country6: 40   Country2: 44   Country3: 25  Country12: 48  Country11: 63  Country11: 42  Country10: 57   Country7: 486.0
7   Country1: 40   Country5: 21   Country6: 33   Country9: 32   Country7: 44  Country10: 24   Country5: 44   Country1: 57   Country8: 39   Country8: 47   Country5: 485.0
8   Country8: 33   Country2: 20  Country11: 33   Country7: 27  Country10: 39   Country2: 21  Country11: 37   Country8: 53   Country3: 17   Country5: 42   Country9: 469.0
9   Country4: 24   Country4: 20   Country9: 20    Country1: 6   Country11: 5   Country5: 20   Country8: 13   Country2: 25   Country5: 16  Country11: 29  Country11: 446.0

Upvotes: 1

Related Questions