Simon Lindgren
Simon Lindgren

Reputation: 2031

How to format data for use with seaborn

I have calculated document distances, and am using MDS in sklearn to plot them with matplotlib. I want to plot them with seaborn (pairplot) but don't know how to translate the MDS data so that it is readable by seaborn.

from sklearn.manifold import MDS

mds = MDS(n_components=2, dissimilarity="precomputed", random_state=1)
pos = mds.fit_transform(dist) 
xs, ys = pos[:, 0], pos[:, 1]

names = [name for name in labels] 

# Define the plot
for x, y, name in zip(xs, ys, names):
    plt.scatter(x, y, color=color)
    plt.text(x, y, name)


plt.show()

Upvotes: 3

Views: 4029

Answers (2)

R.Falque
R.Falque

Reputation: 944

As a complement to Diziet Asahi's response, here is a minimalistic code to create a DataFrame:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

data = {'col1':[1, 1, 1 ,1 ,1 ,1 ,12, 3, 4,5], 'col2':[1, 1, 1 ,1 ,1 ,1 ,12, 3, 4,5]}
df = pd.DataFrame(data)
sns.violinplot(data=df, palette="Pastel1")
plt.show()

Here is the result of this code: enter image description here

Here, you can find other ways to build a Panda DataFrame.

Upvotes: 1

Diziet Asahi
Diziet Asahi

Reputation: 40697

As stated in the documentation for pairplot(), this function expects a long-form dataframe where each column is a variable and each row is an observation. The easiest would be to use Pandas to construct this dataframe (although I believe a numpy array would work).

A long-form dataframe would have as many rows as there are observations, and each column is a variable. The power of seaborn is to use categorical columns to split the dataframe is different groups.

In your case the dataframe would probably look like:

    X           Y           label
0   0.094060    0.484758    Label_00
1   0.375537    0.150206    Label_00
2   0.215755    0.796629    Label_02
3   0.204077    0.921016    Label_01
4   0.673787    0.884718    Label_01
5   0.854112    0.044506    Label_00
6   0.225218    0.552961    Label_00
7   0.668262    0.482514    Label_00
8   0.935415    0.100438    Label_00
9   0.697016    0.633550    Label_01
(...)

And you would pass it to pairplot like so:

sns.pairplot(data=df, hue='label')

enter image description here

Upvotes: 2

Related Questions