Christer Edvardsson
Christer Edvardsson

Reputation: 147

Difficulties importing data into Seaborn Boxplot

EDIT: A solution was provided by user #kgoettler below. The problem arise from Seaborn Boxplot requiring data to be organized by variable in the x-axis and values in the y-axis. The script below reorganizes the data into a form compatible with Seaborn Boxplot.

ORIGINAL QUESTION: My goal is to generate a Box plot using data from an CSV file. I would like to use the Python visualization library Seaborn. The data is organized with a common index (Object) and headers for each column.

Image of raw data

I have difficulties importing this data into a Boxplot using the format

seaborn.boxplot(x="variable", y="value")

Using Pandas own boxplot this is not a problem since I simply specify what columns to use based on headers using the following format

boxplot = data.boxplot(column=['header1', 'header2', 'header3'])

Image of panda boxplot using raw data

I would also prefer not to have to specify each individual column by header, but rater select all columns in the file automatically.

All feedback and input is greatly appreciated!

Upvotes: 0

Views: 458

Answers (1)

kgoettler
kgoettler

Reputation: 169

Something like this should work:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style='whitegrid')

csv_file = '/path/to/file.csv'
df = pd.read_csv(csv_file)
df = (df
        .set_index(['Object'])          # Set 'Object' column to index
        .rename_axis("Metric", axis=1)  # Rename the column axis "Metric"
        .stack()                        # Stack the columns into the index
        .rename('Score')                # Rename the remaining column 'Score'
        .reset_index()                  # Reset the index
    )

This should give you a DataFrame that looks like:

   Object                       Metric     Score
0     MT1  B1A1 Average Splaying Score  0.426824
1     MT1  B1A2 Average Splaying Score  0.431351
2     MT1  B1A3 Average Splaying Score  1.941473
3     MT2  B1A1 Average Splaying Score -0.021672
4     MT2  B1A2 Average Splaying Score  3.357387

Then to plot, all you have to do is:

fig, ax = plt.subplots(figsize=(10,6))
ax = sns.boxplot(x='Metric', y='Score', data=df, ax=ax)
ax.set_xlabel('')

Example Plot

Upvotes: 1

Related Questions