John Smith
John Smith

Reputation: 2886

Python translate matplotlib to a plotnine chart

I am currently working through the book Hands On Machine Learning and am trying to replicate a visualization where we plot the lat and lon co-ordinates on a scatter plot of San Diego. I have taken the plot code from the book which uses the code below (matplotlib method). I would like to replicate the same visualization using plotnine. Could someone help me with the translation.

matplotlib method

# DATA INGEST -------------------------------------------------------------    
# Import the file from github
url = "https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv" # Make sure the url is the raw version of the file on GitHub
download = requests.get(url).content

# Reading the downloaded content and turning it into a pandas dataframe
housing = pd.read_csv(io.StringIO(download.decode('utf-8')))

# Then plot
import matplotlib.pyplot as plt

# The size is now related to population divided by 100
# the colour is related to the median house value
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4, 
              s=housing["population"]/100, label="population", figsize=(10,7),
              c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)
plt.legend()
plt.show()

plotnine method

from plotnine import ggplot, geom_point, aes, stat_smooth, scale_color_cmap

# Lets try the same thing in ggplot
(ggplot(housing, aes('longitude', 'latitude', size = "population", color = "median_house_value"))
 + geom_point(alpha = 0.1)
 + scale_color_cmap(name="jet"))
 

Upvotes: 2

Views: 547

Answers (1)

brb
brb

Reputation: 1179

If your question was the colour mapping, then you were close: just needed cmap_name='jet' instead of name='jet'.

If it is a broader styling thing, below is close to what you had with matplotlib.

matplotlib method

enter image description here

plotline method enter image description here

p = (ggplot(housing, aes(x='longitude', y='latitude', size='population', color='median_house_value'))
  + theme_matplotlib()
  + geom_point(alpha=0.4)
  + annotate('text', x=-114.6, y=42, label='population', size=8)
  + annotate('point', x=-115.65, y=42, size=5, color='#6495ED', fill='#6495ED', alpha=0.8)
  + labs(x=None, color='Median house value')
  + scale_y_continuous(breaks=np.arange(34,44,2))
  + scale_color_cmap(cmap_name='jet')
  + scale_size_continuous(range=(0.05, 6))
  + guides(size=False)
  + theme(
        text = element_text(family='DejaVu Sans', size=8),
        axis_text_x = element_blank(),
        axis_ticks_minor=element_blank(),
        legend_key_height = 34,
        legend_key_width = 9,        
  )
 )
p

I am not sure to what capacity it's possible to modify the formatting of colour bar in plotnine. If others have additional ideas, I would be most interested - I think the matplotlib colour bar looks nicer.

Upvotes: 2

Related Questions