Reputation: 187
I want to plot the top 10 countries (displayed in rows index) according to the values taken for each column.
The columns are the features with which I evaluate the countries : "feature1","feature2","feature3","feature4","feature5","feature6","feature7","feature8","feature9","feature10". So this would be 10 graphs with top 10 ranking. Then I want the "global" top 10 that is taking into account all columns (let's say each column has the same coefficient).
I was thinking about making a new df that shows the most recurring country in these top 10 dfs (the country that appears the most in the "top 10") but don't know how to.
I am struggling so I started by creating new dataframes from the original large dataset named "data_etude" which I made a copy of "date_etude_copy". For each new dataframe "data_ind" I added a new column, to show top 10 based on each feature/column I am analysing (The columns are the features and the rows are the values taken by the countries).
Then I wrote a script to create from these dataframes another dataframe that shows only the top 10 ranking, the values and prameter. I am aware that this is quite laborious and as a beginner I didn't manage to make a loop from this...
the original dataset:
data_etude_copy = data_etude.copy()
dataframes of top 10 countries for each feature (but should do a loop this is so laborious) :
data_ind1 = data_etude_copy.sort_values(by=['feature1'], ascending=False).head(10)
data_ind2 = data_etude_copy.sort_values(by=['feature2'], ascending=False).head(10)
data_ind3 = data_etude_copy.sort_values(by=['feature3'], ascending=False).head(10)
data_ind4 = data_etude_copy.sort_values(by=['feature4'], ascending=False).head(10)
data_ind5 = data_etude_copy.sort_values(by=['feature5'], ascending=False).head(10)
data_ind6 = data_etude_copy.sort_values(by=['feature6'], ascending=False).head(10)
data_ind7 = data_etude_copy.sort_values(by=['feature7'], ascending=False).head(10)
data_ind8 = data_etude_copy.sort_values(by=['feature8'], ascending=False).head(10)
data_ind9 = data_etude_copy.sort_values(by=['feature9'], ascending=False).head(10)
data_ind10 = data_etude_copy.sort_values(by=['feature10'], ascending=False).head(10)
and simplified dfs with top 10 for each feature (i need a loop I know...)
data_ind1.drop(data_ind1.loc[:,data_ind1.columns!="feature1"], inplace=True, axis = 1)
data_ind2.drop(data_ind2.loc[:,data_ind2.columns!="feature2"], inplace=True, axis = 1)
data_ind3.drop(data_ind3.loc[:,data_ind3.columns!="feature3"], inplace=True, axis = 1)
data_ind4.drop(data_ind4.loc[:,data_ind4.columns!="feature4"], inplace=True, axis = 1)
data_ind5.drop(data_ind5.loc[:,data_ind5.columns!="feature5"], inplace=True, axis = 1)
data_ind6.drop(data_ind6.loc[:,data_ind6.columns!="feature6"], inplace=True, axis = 1)
data_ind7.drop(data_ind7.loc[:,data_ind7.columns!="feature7"], inplace=True, axis = 1)
data_ind8.drop(data_ind8.loc[:,data_ind8.columns!="feature8"], inplace=True, axis = 1)
data_ind9.drop(data_ind9.loc[:,data_ind9.columns!="feature9"], inplace=True, axis = 1)
data_ind10.drop(data_ind3.loc[:,data_ind10.columns!="feature10"], inplace=True, axis = 1)
How could I make this into a loop and plot the aimed result? That is to say:
-plotting top 10 countries for each features
-then a final "top 10 countries" taking into account all 10 features (eather with countries that appears the most in each df or countries with best ranking if all features have same coefficient value)?
Upvotes: 0
Views: 2180
Reputation: 76
I think this is what you're asking for? I put your code into a for loop form and added code for ranking the countries overall. The overall ranking is based on all features, not just the top 10 lists but if you'd like it the other way then just switch the order of the commented blocks in the first for loop. I also wasn't sure how you wanted to display it so currently it just prints the final dataframe. It's probably not the cleanest code ever but I hope it helps!
import pandas as pd
import numpy as np
data = np.random.randint(100,size=(12,10))
countries = [
'Country1',
'Country2',
'Country3',
'Country4',
'Country5',
'Country6',
'Country7',
'Country8',
'Country9',
'Country10',
'Country11',
'Country12',
]
feature_names_weights = {
'feature1' :1.0,
'feature2' :1.0,
'feature3' :1.0,
'feature4' :1.0,
'feature5' :1.0,
'feature6' :1.0,
'feature7' :1.0,
'feature8' :1.0,
'feature9' :1.0,
'feature10' :1.0,
}
feature_names = list(feature_names_weights.keys())
df = pd.DataFrame(data=data, index=countries, columns=feature_names)
data_etude_copy = df
data_sorted_by_feature = {}
country_scores = (pd.DataFrame(data=np.zeros(len(countries)),index=countries))[0]
for feature in feature_names:
#Adds to each country's score and multiplies by weight factor for each feature
for country in countries:
country_scores[country] += data_etude_copy[feature][country]*(feature_names_weights[feature])
#Sorts the countries by feature (your code in loop form)
data_sorted_by_feature[feature] = data_etude_copy.sort_values(by=[feature], ascending=False).head(10)
data_sorted_by_feature[feature].drop(data_sorted_by_feature[feature].loc[:,data_sorted_by_feature[feature].columns!=feature], inplace=True, axis = 1)
#sort country total scores
ranked_countries = country_scores.sort_values(ascending=False).head(10)
##Put everything into one DataFrame
#Create empty DataFrame
empty_data=np.empty((10,11),str)
outputDF = pd.DataFrame(data=empty_data,columns=((feature_names)+['Overall']))
#Add entries for all features
for feature in feature_names:
for index in range(10):
country = list(data_sorted_by_feature[feature].index)[index]
outputDF[feature][index] = f'{country}: {data_sorted_by_feature[feature][feature][country]}'
#Add column for overall country score
for index in range(10):
country = list(ranked_countries.index)[index]
outputDF['Overall'][index] = f'{country}: {ranked_countries[country]}'
#Print DataFrame
print(outputDF)
Example data in:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10
Country1 40 31 5 6 4 67 65 57 52 96
Country2 93 20 41 65 44 21 91 25 43 75
Country3 93 34 87 69 0 25 65 71 17 91
Country4 24 20 41 68 46 1 94 87 11 97
Country5 90 21 93 0 72 20 44 87 16 42
Country6 93 17 33 40 96 53 1 97 51 20
Country7 82 50 34 27 44 38 49 85 7 70
Country8 33 81 14 5 72 13 13 53 39 47
Country9 18 38 20 32 52 96 51 93 53 16
Country10 75 94 91 59 39 24 7 0 96 57
Country11 62 9 33 89 5 77 37 63 42 29
Country12 7 98 43 71 98 81 48 13 61 69
Corresponding output:
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10 Overall
0 Country2: 93 Country12: 98 Country5: 93 Country11: 89 Country12: 98 Country9: 96 Country4: 94 Country6: 97 Country10: 96 Country4: 97 Country12: 589.0
1 Country3: 93 Country10: 94 Country10: 91 Country12: 71 Country6: 96 Country12: 81 Country2: 91 Country9: 93 Country12: 61 Country1: 96 Country3: 552.0
2 Country6: 93 Country8: 81 Country3: 87 Country3: 69 Country5: 72 Country11: 77 Country1: 65 Country4: 87 Country9: 53 Country3: 91 Country10: 542.0
3 Country5: 90 Country7: 50 Country12: 43 Country4: 68 Country8: 72 Country1: 67 Country3: 65 Country5: 87 Country1: 52 Country2: 75 Country2: 518.0
4 Country7: 82 Country9: 38 Country2: 41 Country2: 65 Country9: 52 Country6: 53 Country9: 51 Country7: 85 Country6: 51 Country7: 70 Country6: 501.0
5 Country10: 75 Country3: 34 Country4: 41 Country10: 59 Country4: 46 Country7: 38 Country7: 49 Country3: 71 Country2: 43 Country12: 69 Country4: 489.0
6 Country11: 62 Country1: 31 Country7: 34 Country6: 40 Country2: 44 Country3: 25 Country12: 48 Country11: 63 Country11: 42 Country10: 57 Country7: 486.0
7 Country1: 40 Country5: 21 Country6: 33 Country9: 32 Country7: 44 Country10: 24 Country5: 44 Country1: 57 Country8: 39 Country8: 47 Country5: 485.0
8 Country8: 33 Country2: 20 Country11: 33 Country7: 27 Country10: 39 Country2: 21 Country11: 37 Country8: 53 Country3: 17 Country5: 42 Country9: 469.0
9 Country4: 24 Country4: 20 Country9: 20 Country1: 6 Country11: 5 Country5: 20 Country8: 13 Country2: 25 Country5: 16 Country11: 29 Country11: 446.0
Upvotes: 1