Reputation: 107
I would like to plot a scatter graph to visually represent data points in the form (string, string). Where each coordinate is a string taken from a given set of String values, a set for the X axis and one for the Y axis. I'm having trouble finding a library -possibly python- which allows the representation of only categorical data (no numeric values).
I have tried with Seaborn swarmplot but it seems at least one coordinate must be numeric.
I know points with the same two coordinates would collide, and i was hoping to find a library which represented those points as adjacent (cluster like).
Thanks.
Upvotes: 1
Views: 4136
Reputation: 379
pandas is a great library for this.
You can create a dataframe with your categorical variables (note the dtype='category'
argument to the dataframe createion), then get the numerical codes for each categorical variable, and scatter plot using pandas itself, or matplotlib, or whatever you like.
Example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'col1': list('abcab'), 'col2': list('acbbb')}, dtype='category')
In [3]: df
Out[3]:
col1 col2
0 a a
1 b c
2 c b
3 a b
4 b b
In [4]: df_num = df.apply(lambda x: x.cat.codes)
In [5]: df_num
Out[5]:
col1 col2
0 0 0
1 1 2
2 2 1
3 0 1
4 1 1
In [6]: df_num.plot.scatter('col1', 'col2')
Upvotes: 3