Ruslan Pylypiuk
Ruslan Pylypiuk

Reputation: 145

Messed up plots using boxplot with Seaborn

I was using this code to plot all data in my df:

m_cols = ['is_canceled','lead_time', 'arrival_date_year','arrival_date_week_number','arrival_date_day_of_month','stays_in_weekend_nights','adults','children','babies','is_repeated_guest','previous_cancellations','previous_bookings_not_canceled','booking_changes','agent','total_of_special_requests']
for col in num_cols:
    sns.boxplot(y=df['is_canceled'].astype('category'),x=col,data=df)
    plt.show()
   

But I got a few plots that look like this, how can I fix it?enter image description here

Upvotes: 1

Views: 422

Answers (1)

JohanC
JohanC

Reputation: 80329

The boxplots seem to show that the large majority of values is zero, and the rest are shown as outliers. So e.g. previous_annulations is usually zero, a few have some specif value. All outliers with the same value are drawn on top of each other. Note that the "box" of a boxplot goes between the 25th and the 75th percentile, with a division at the median.

An idea could be to use a different type of plot, e.g. a violinplot using the titanic dataset:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('titanic')
m_cols = df.select_dtypes('number').columns.to_list()[1:]
fig, axs = plt.subplots(nrows=len(m_cols), ncols=2, figsize=(15, 7))
for col, ax_row in zip(m_cols, axs):
    sns.boxplot(y=df['survived'].astype('category'), x=col, data=df, ax=ax_row[0], palette='rocket')
    sns.violinplot(y=df['survived'].astype('category'), x=col, data=df, ax=ax_row[1], palette='rocket')
sns.despine()
plt.tight_layout()
plt.show()

sns.violinplot vs sns.boxplot to compare

Upvotes: 1

Related Questions