ryszard eggink
ryszard eggink

Reputation: 335

Extracting 5 countries with closest population

I have an object of type series with population in year 2013 of different countries, where countries are indexes.

Example of Input:

Country Name     Population in 2013
Aruba            103159.0
Afghanistan      32269589.0
Angola           26015780.0
...              ...

Now I want to randomly pick one country and its population. I do it this way.

countr = set(country.name for country in pycountry.countries)
listofcountr=list(countr)
randcountry=random.choice(listofcountr)

And now I want to find 5 countries, whose population is closest to population of this random country I found. Closests in the meaning of absolute value. How can I achieve that?

Upvotes: 0

Views: 142

Answers (4)

Jan Christoph Terasa
Jan Christoph Terasa

Reputation: 5935

You can calculate the difference between all countries in a n x n array, sort the rows, and randomly select a row. Assuming you have one array or list for the countries countries, and one array or list for the populations populations:

import numpy as np

populations = np.asarray(populations)
diffs = populations[:, np.newaxis] - populations
order = np.abs(diffs).argsort(axis=1)

Now, randomly select a country:

choice = np.random.randint(0, populations.size)

Then, select the closest five countries:

selection = order[choice, 1:6]
closest_countries = np.asarray(countries)[selection]

Upvotes: 0

Sajan
Sajan

Reputation: 1267

Another way to do it using pandas could be the following ( please note that the values are dummy values ) -

df = pd.DataFrame({'pop':[10,20,30,15,34,23,10,12], 'country':['a','b','c','d','e','f','g','h']})
df = df.set_index('country')
df
         pop
country
a         10
b         20
c         30
d         15
e         34
f         23
g         10
h         12

Now if you want to find the 5 countries with values of pop which are closest to say, country b, you could try the following -

df['diff'] = (df['pop'] - df.loc['b', 'pop']).abs()
df[df.index != 'b'].sort_values(['diff']).head(5).index.tolist()
['f', 'd', 'h', 'a', 'c', 'g']

Upvotes: 0

ccl
ccl

Reputation: 2378

You can compute the absolute difference of all countries with the chosen country, save it to a list, and sort the list. Here is a non-Numpy version:

randcountry = random.choice(listofcountr)
pop_distance = [abs(randcountry-i) for i in listofcountr]
sorted_list = sorted(pop_distance)
five_closest = sorted_list[1:6] #excluding the first country, which is the chosen country

Using Numpy, you can parallelise (speed up) operations like so:

import numpy as np

randcountry = random.choice(lsitofcountr)
listofcountr = np.array(listofcountr)
pop_distance = abs(listofcountr - randcountry)
five_closest = np.sort(pop_distance)[1:6]

Upvotes: 3

Karol Workowski
Karol Workowski

Reputation: 42

as it isnt large amount of data you can try adding variable that subtracts randomcountry - every country on the list and adding those differences to the list, sorting that list and then printing 5 first elements on that list or creating the copy of that list's only first 5 records

Upvotes: 0

Related Questions