matplotlib: scatter plot from string

Question

I have data in this form in a text file:

strings  year  avg
--       --    --
abc      2012  1854
abc      2013  2037
abc      2014  1781
pqr      2011  1346
pqr      2012  1667
xyz      2015  1952

I want to make a scatter plot with (distinct) strings on the x-axis, (distinct) year on the y-axis and the size of marker (circle) should be equal to the avg. I am having trouble implementing it in matplotlib, because the scatter function expects a numerical value for x,y (data positions). Because of that, I am unable to assign strings as x and year as y. Do I need to pre-process this data further?

ImportanceOfBeingErnest · Accepted Answer

Plotting categorical variable scatter with matplotlib >=2.1

In matplotlib 2.1 you may just supply the strings to the scatter function.

strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]

import matplotlib.pyplot as plt
import numpy as np

plt.scatter(strings, year, s=avg)

plt.show()

Plotting categorical variable scatter with matplotlib < 2.1

In matplotlib below 2.1 you need to plot the data against some index which corresponds to the categories. Then set the labels accordingly.

strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]

import matplotlib.pyplot as plt
import numpy as np

u, ind = np.unique(strings, return_inverse=True)
plt.scatter(ind, year, s=avg)
plt.xticks(range(len(u)), u)

plt.show()

Output in both cases

matplotlib: scatter plot from string

Answers (2)

Plotting categorical variable scatter with matplotlib >=2.1

Plotting categorical variable scatter with matplotlib < 2.1

Related Questions