Reputation: 61
Is it possible to group the data (for defining x and y variables) for running regression directly in regPlot (or any other seaborn feature)? I am unable to find an inbuilt feature of that sort.
For example, in a column, I have a categorical variable "C", then I am trying to fit a regression line (with x and y) using the median for each category of C. Is there any functionality to do so?
Upvotes: 1
Views: 1373
Reputation: 2519
You need to group by your data with pandas
first and then plot it with seaborn
. Since you didn't provide your dataframe, I will use a seaborn sample dataset to demonstrate.
import pandas as pd
import seaborn as sns
# load dataframe
df = sns.load_dataset('car_crashes')
The dataframe looks like the following. abbrev
column is a category column. I will use total
and speeding
variable as y and x.
First, use pandas .groupby()
method and pass your categorical variable and in the meantime chain another method .median()
so that pandas will aggregate your data and return the median for data. Pandas will return a dataframe that looks like the following
And then just call the column you want to plot. In our case, they are total
and speeding
. After, pass your x and y to seaborn .regplot()
# group by
x = df.groupby(['abbrev']).median().speeding
y = df.groupby(['abbrev']).median().total
# plot
sns.regplot(x, y)
Upvotes: 1