Saad
Saad

Reputation: 1360

matplotlib: scatter plot from string

I have data in this form in a text file:

strings  year  avg
--       --    --
abc      2012  1854
abc      2013  2037
abc      2014  1781
pqr      2011  1346
pqr      2012  1667
xyz      2015  1952

I want to make a scatter plot with (distinct) strings on the x-axis, (distinct) year on the y-axis and the size of marker (circle) should be equal to the avg. I am having trouble implementing it in matplotlib, because the scatter function expects a numerical value for x,y (data positions). Because of that, I am unable to assign strings as x and year as y. Do I need to pre-process this data further?

Upvotes: 2

Views: 8658

Answers (2)

Natty
Natty

Reputation: 578

Even I wanted the same and found an easier way. You can use Seaborn, which is a library based on Matplotlib.

You can give the text on either axis and time/year on the other axis. To get maximum Visualization you can set the limit for both the axis. Lets give 'df' as the name to your dataframe

import seaborn as sns

minYear = df['year'].min()
maxYear = df['year'].max()
pl = sns.catplot(x = strings,y = year, data = df)
pl.set(ylim=(minYear,maxYear))

This will give you the best possible visualization.

Upvotes: 1

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339340

Plotting categorical variable scatter with matplotlib >=2.1

In matplotlib 2.1 you may just supply the strings to the scatter function.

strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]

import matplotlib.pyplot as plt
import numpy as np

plt.scatter(strings, year, s=avg)

plt.show()

Plotting categorical variable scatter with matplotlib < 2.1

In matplotlib below 2.1 you need to plot the data against some index which corresponds to the categories. Then set the labels accordingly.

strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]

import matplotlib.pyplot as plt
import numpy as np

u, ind = np.unique(strings, return_inverse=True)
plt.scatter(ind, year, s=avg)
plt.xticks(range(len(u)), u)

plt.show()

Output in both cases

enter image description here

Upvotes: 4

Related Questions