elelias
elelias

Reputation: 4771

scatter plots with string arrays in matplotlib

this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.

X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float

fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()

isn't there an easy way to do this? Thanks.

Upvotes: 10

Views: 12437

Answers (3)

naught101
naught101

Reputation: 19553

Scatter does this automatically now (from at least matplotlib 2.1.0):

plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])   

scatter plot

Upvotes: 3

jmetz
jmetz

Reputation: 12773

Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.

Using hash

You could use the hash function for the conversion;

from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values

X =[hash(l) for l in xlab] 
Y =[hash(l) for l in xlab] 

Z= myDataFrame.columnY.values #float

fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()

As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hash function could give unexpected spacings.

Nondegenerate uniform spacing

If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use

X =[i for i in range(len(xlab))]

though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.

Degenerate uniform spacing

A third alternative is to first get the unique members of xlab (using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g.

xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]

Upvotes: 2

unutbu
unutbu

Reputation: 879749

You could use np.unique(..., return_inverse=True) to get representative ints for each string. For example,

In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)

In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])

Note that X has dtype int32, as np.unique can handle at most 2**31 unique strings.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d

N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
       yticks=range(len(Yuniques)), yticklabels=Yuniques) 
plt.show()

enter image description here

Upvotes: 11

Related Questions