Reputation: 424
My name is Luis Francisco Gomez and I am in the course Intermediate Python > 1 Matplotlib > Sizes that belongs to the Data Scientist with Python in DataCamp. I am reproducing the exercises of the course where in this part you have to make a scatter plot in which the size of the points are equivalent to the population of the countries. I try to reproduce the results of DataCamp with this code:
# load subpackage
import matplotlib.pyplot as plt
## load other libraries
import pandas as pd
import numpy as np
## import data
gapminder = pd.read_csv("https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv")
gdp_cap = gapminder["gdp_cap"].tolist()
life_exp = gapminder["life_exp"].tolist()
# create an np array that contains the population
pop = gapminder["population"].tolist()
pop_np = np.array(pop)
plt.scatter(gdp_cap, life_exp, s = pop_np*2)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
# Display the plot
plt.show()
However a get this:
But in theory you need to get this:
I don't understand what is the problem with the argument s in plt.scatter .
Upvotes: 2
Views: 192
Reputation: 150805
This is because your sizes are too large, scale it down. Also, there's no need to create all the intermediate arrays:
plt.scatter(gapminder.gdp_cap,
gapminder.life_exp,
s=gapminder.population/1e6)
Output:
Upvotes: 1
Reputation: 353
I think you should use
plt.scatter(gdp_cap, life_exp, s = gdp_cap*2)
or maybe reduce or scale pop_np
Upvotes: 0
Reputation: 153510
You need to scale your s,
plt.scatter(gdp_cap, life_exp, s = pop_np*2/1000000)
The marker size in points**2. Per docs
Upvotes: 2