tomer schmid
tomer schmid

Reputation: 1

How to compare all rows vs certain rows in boxplot

I've got a dataframe containing a list of properties that have and don't have a waterfront, which are marked by Boolean statements, and the price of those properties. I want to compare the median price (I was thinking of using a boxplot from sns.catplot) of properties with a waterfront and all properties (with and without waterfront). How do I make it so the boxplot has 2 boxes, 1 of waterfront properties, and one of all properties? My code is:

h = sns.catplot(x="waterfront", y="price", kind="box", data=df)

This code obviously makes 2 boxes, 1 for properties with and 1 for properties without a waterfront.

Upvotes: 0

Views: 70

Answers (1)

JohanC
JohanC

Reputation: 80429

An approach is to copy the rows with a waterfront. And then add a new column that distinguishes between a waterfront and all.

Here is some test code to illustrate the idea:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({'waterfront': np.random.randint(0, 2, 50, dtype=bool),
                   'price': np.random.randint(200000, 900000, 50)})
df1 = df[df['waterfront']].copy()  # create a copy of the dataframe with only the waterfronts
df['waterfront vs all'] = 'all'
df1['waterfront vs all'] = 'waterfront'
sns.catplot(data=pd.concat([df, df1]), x='waterfront vs all', y='price', kind='box')
plt.tight_layout()
plt.show()

sns.catplot selection of rows vs all rows

Instead of a boxplot, also a swarmplot could look interesting. Here is how they could look combined:

ax = sns.boxplot(data=pd.concat([df, df1]), x='waterfront vs all', y='price', boxprops={'alpha': 0.3})
sns.swarmplot(data=pd.concat([df, df1]), x='waterfront vs all', y='price')

boxplot with swarmplot

Note that most seaborn functions also accept an order= parameter, to force an ordering on the x-values (e.g. order=['waterfront', 'all'].

Upvotes: 1

Related Questions