Reputation: 1360
I have data in this form in a text file:
strings year avg
-- -- --
abc 2012 1854
abc 2013 2037
abc 2014 1781
pqr 2011 1346
pqr 2012 1667
xyz 2015 1952
I want to make a scatter plot with (distinct) strings on the x-axis, (distinct) year on the y-axis and the size of marker (circle) should be equal to the avg. I am having trouble implementing it in matplotlib, because the scatter function expects a numerical value for x,y (data positions). Because of that, I am unable to assign strings as x and year as y. Do I need to pre-process this data further?
Upvotes: 2
Views: 8658
Reputation: 578
Even I wanted the same and found an easier way. You can use Seaborn, which is a library based on Matplotlib.
You can give the text on either axis and time/year on the other axis. To get maximum Visualization you can set the limit for both the axis. Lets give 'df' as the name to your dataframe
import seaborn as sns
minYear = df['year'].min()
maxYear = df['year'].max()
pl = sns.catplot(x = strings,y = year, data = df)
pl.set(ylim=(minYear,maxYear))
This will give you the best possible visualization.
Upvotes: 1
Reputation: 339340
In matplotlib 2.1 you may just supply the strings to the scatter function.
strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]
import matplotlib.pyplot as plt
import numpy as np
plt.scatter(strings, year, s=avg)
plt.show()
In matplotlib below 2.1 you need to plot the data against some index which corresponds to the categories. Then set the labels accordingly.
strings = ["abc","abc","abc","pqr","pqr","xyz"]
year = list(range(2012,2018))
avg = [1854, 2037,1781,1346,1667,1952]
import matplotlib.pyplot as plt
import numpy as np
u, ind = np.unique(strings, return_inverse=True)
plt.scatter(ind, year, s=avg)
plt.xticks(range(len(u)), u)
plt.show()
Output in both cases
Upvotes: 4