Aleksejs Fomins
Aleksejs Fomins

Reputation: 900

Matplotlib / Seaborn violin plots for different data sizes

I have 3 one-dimensional data arrays A, B, C. All of them have different length.

I would like to make a violin plot with 3 violins, one per each array. How do I do this?

EDIT: I have solved the problem by writing a proxy function, but having to convert the labels into column for every array feels wasteful. Is it possible to do it nicer/more efficiently

def dict2pandas(d, keyname, valname):
    dframes = []
    for k,v in d.items():
        dframes += [pd.DataFrame({keyname : [k] * len(v), valname : v})]
    return pd.concat(dframes)

data = {
    'A' : np.random.normal(1, 1, 100),
    'B' : np.random.normal(2, 1, 110),
    'C' : np.random.normal(3, 1, 120)
}

dataDF = dict2pandas(data, 'arrays', 'values')

fig, ax = plt.subplots()
sns.violinplot(data=dataDF, x='arrays', y='values', scale='width', axis=ax)
plt.show()

Upvotes: 2

Views: 2817

Answers (2)

Matthew Walker
Matthew Walker

Reputation: 2755

I too could find no better idea than filling the Pandas DataFrame with NaNs, but this approach is perhaps a little tidier:

import numpy as np
import pandas as pd
import seaborn as sns

# OP's data
data = {
    'A' : np.random.normal(1, 1, 100),
    'B' : np.random.normal(2, 1, 110),
    'C' : np.random.normal(3, 1, 120)
}

# Create DataFrame where NaNs fill shorter arrays 
df = pd.DataFrame([data['A'], data['B'], data['C']]).transpose()

# Label the columns of the DataFrame
df = df.set_axis(['A','B','C'], axis=1)

# Violin plot  
sns.violinplot(data=df)

Upvotes: 0

Diziet Asahi
Diziet Asahi

Reputation: 40697

Although it amounts to roughly the same thing, you could pad your numpy arrays with nan so they are all the same size. Then they can be put in a dataframe for plotting with seaborn:

data = {
    'A' : np.random.normal(1, 1, 100),
    'B' : np.random.normal(2, 1, 110),
    'C' : np.random.normal(3, 1, 120)
}
maxsize = max([a.size for a in data.values()])
data_pad = {k:np.pad(v, pad_width=(0,maxsize-v.size,), mode='constant', constant_values=np.nan) for k,v in data.items()}
df = pd.DataFrame(data_pad)

fig, ax = plt.subplots()
sns.violinplot(data=df)

Upvotes: 5

Related Questions