How to format data for use with seaborn

Question

I have calculated document distances, and am using MDS in sklearn to plot them with matplotlib. I want to plot them with seaborn (pairplot) but don't know how to translate the MDS data so that it is readable by seaborn.

from sklearn.manifold import MDS

mds = MDS(n_components=2, dissimilarity="precomputed", random_state=1)
pos = mds.fit_transform(dist) 
xs, ys = pos[:, 0], pos[:, 1]

names = [name for name in labels] 

# Define the plot
for x, y, name in zip(xs, ys, names):
    plt.scatter(x, y, color=color)
    plt.text(x, y, name)


plt.show()

Diziet Asahi · Accepted Answer

As stated in the documentation for pairplot(), this function expects a long-form dataframe where each column is a variable and each row is an observation. The easiest would be to use Pandas to construct this dataframe (although I believe a numpy array would work).

A long-form dataframe would have as many rows as there are observations, and each column is a variable. The power of seaborn is to use categorical columns to split the dataframe is different groups.

In your case the dataframe would probably look like:

    X           Y           label
0   0.094060    0.484758    Label_00
1   0.375537    0.150206    Label_00
2   0.215755    0.796629    Label_02
3   0.204077    0.921016    Label_01
4   0.673787    0.884718    Label_01
5   0.854112    0.044506    Label_00
6   0.225218    0.552961    Label_00
7   0.668262    0.482514    Label_00
8   0.935415    0.100438    Label_00
9   0.697016    0.633550    Label_01
(...)

And you would pass it to pairplot like so:

sns.pairplot(data=df, hue='label')

How to format data for use with seaborn

Answers (2)

Related Questions