Reputation: 4771
this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.
X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float
fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()
isn't there an easy way to do this? Thanks.
Upvotes: 10
Views: 12437
Reputation: 19553
Scatter does this automatically now (from at least matplotlib 2.1.0):
plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])
Upvotes: 3
Reputation: 12773
Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.
Using hash
You could use the hash
function for the conversion;
from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values
X =[hash(l) for l in xlab]
Y =[hash(l) for l in xlab]
Z= myDataFrame.columnY.values #float
fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()
As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hash
function could give unexpected spacings.
Nondegenerate uniform spacing
If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use
X =[i for i in range(len(xlab))]
though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y
.
Degenerate uniform spacing
A third alternative is to first get the unique members of xlab
(using e.g. set
) and then map each xlab to a position using the unique set for the mapping; e.g.
xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]
Upvotes: 2
Reputation: 879749
You could use np.unique(..., return_inverse=True) to get representative ints for each string. For example,
In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)
In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])
Note that X
has dtype int32
, as np.unique
can handle at most 2**31
unique strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
yticks=range(len(Yuniques)), yticklabels=Yuniques)
plt.show()
Upvotes: 11