Reputation: 335
I have an object of type series with population in year 2013 of different countries, where countries are indexes.
Example of Input:
Country Name Population in 2013
Aruba 103159.0
Afghanistan 32269589.0
Angola 26015780.0
... ...
Now I want to randomly pick one country and its population. I do it this way.
countr = set(country.name for country in pycountry.countries)
listofcountr=list(countr)
randcountry=random.choice(listofcountr)
And now I want to find 5 countries, whose population is closest to population of this random country I found. Closests in the meaning of absolute value. How can I achieve that?
Upvotes: 0
Views: 142
Reputation: 5935
You can calculate the difference between all countries in a n x n array, sort the rows, and randomly select a row. Assuming you have one array or list for the countries countries
, and one array or list for the populations populations
:
import numpy as np
populations = np.asarray(populations)
diffs = populations[:, np.newaxis] - populations
order = np.abs(diffs).argsort(axis=1)
Now, randomly select a country:
choice = np.random.randint(0, populations.size)
Then, select the closest five countries:
selection = order[choice, 1:6]
closest_countries = np.asarray(countries)[selection]
Upvotes: 0
Reputation: 1267
Another way to do it using pandas
could be the following ( please note that the values are dummy values ) -
df = pd.DataFrame({'pop':[10,20,30,15,34,23,10,12], 'country':['a','b','c','d','e','f','g','h']})
df = df.set_index('country')
df
pop
country
a 10
b 20
c 30
d 15
e 34
f 23
g 10
h 12
Now if you want to find the 5 countries with values of pop
which are closest to say, country b
, you could try the following -
df['diff'] = (df['pop'] - df.loc['b', 'pop']).abs()
df[df.index != 'b'].sort_values(['diff']).head(5).index.tolist()
['f', 'd', 'h', 'a', 'c', 'g']
Upvotes: 0
Reputation: 2378
You can compute the absolute difference of all countries with the chosen country, save it to a list, and sort the list. Here is a non-Numpy version:
randcountry = random.choice(listofcountr)
pop_distance = [abs(randcountry-i) for i in listofcountr]
sorted_list = sorted(pop_distance)
five_closest = sorted_list[1:6] #excluding the first country, which is the chosen country
Using Numpy, you can parallelise (speed up) operations like so:
import numpy as np
randcountry = random.choice(lsitofcountr)
listofcountr = np.array(listofcountr)
pop_distance = abs(listofcountr - randcountry)
five_closest = np.sort(pop_distance)[1:6]
Upvotes: 3
Reputation: 42
as it isnt large amount of data you can try adding variable that subtracts randomcountry - every country on the list and adding those differences to the list, sorting that list and then printing 5 first elements on that list or creating the copy of that list's only first 5 records
Upvotes: 0