Reputation: 147
EDIT: A solution was provided by user #kgoettler below. The problem arise from Seaborn Boxplot requiring data to be organized by variable in the x-axis and values in the y-axis. The script below reorganizes the data into a form compatible with Seaborn Boxplot.
ORIGINAL QUESTION: My goal is to generate a Box plot using data from an CSV file. I would like to use the Python visualization library Seaborn. The data is organized with a common index (Object) and headers for each column.
I have difficulties importing this data into a Boxplot using the format
seaborn.boxplot(x="variable", y="value")
Using Pandas own boxplot this is not a problem since I simply specify what columns to use based on headers using the following format
boxplot = data.boxplot(column=['header1', 'header2', 'header3'])
I would also prefer not to have to specify each individual column by header, but rater select all columns in the file automatically.
All feedback and input is greatly appreciated!
Upvotes: 0
Views: 458
Reputation: 169
Something like this should work:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style='whitegrid')
csv_file = '/path/to/file.csv'
df = pd.read_csv(csv_file)
df = (df
.set_index(['Object']) # Set 'Object' column to index
.rename_axis("Metric", axis=1) # Rename the column axis "Metric"
.stack() # Stack the columns into the index
.rename('Score') # Rename the remaining column 'Score'
.reset_index() # Reset the index
)
This should give you a DataFrame that looks like:
Object Metric Score
0 MT1 B1A1 Average Splaying Score 0.426824
1 MT1 B1A2 Average Splaying Score 0.431351
2 MT1 B1A3 Average Splaying Score 1.941473
3 MT2 B1A1 Average Splaying Score -0.021672
4 MT2 B1A2 Average Splaying Score 3.357387
Then to plot, all you have to do is:
fig, ax = plt.subplots(figsize=(10,6))
ax = sns.boxplot(x='Metric', y='Score', data=df, ax=ax)
ax.set_xlabel('')
Upvotes: 1