Prabhat
Prabhat

Reputation: 61

Using group by in regression to define x and y values in python

Is it possible to group the data (for defining x and y variables) for running regression directly in regPlot (or any other seaborn feature)? I am unable to find an inbuilt feature of that sort.

For example, in a column, I have a categorical variable "C", then I am trying to fit a regression line (with x and y) using the median for each category of C. Is there any functionality to do so?

Upvotes: 1

Views: 1373

Answers (1)

steven
steven

Reputation: 2519

You need to group by your data with pandas first and then plot it with seaborn. Since you didn't provide your dataframe, I will use a seaborn sample dataset to demonstrate.

import pandas as pd
import seaborn as sns
# load dataframe
df = sns.load_dataset('car_crashes')

The dataframe looks like the following. abbrev column is a category column. I will use total and speeding variable as y and x.

enter image description here

First, use pandas .groupby() method and pass your categorical variable and in the meantime chain another method .median() so that pandas will aggregate your data and return the median for data. Pandas will return a dataframe that looks like the following enter image description here

And then just call the column you want to plot. In our case, they are total and speeding. After, pass your x and y to seaborn .regplot()

# group by
x = df.groupby(['abbrev']).median().speeding
y = df.groupby(['abbrev']).median().total
# plot
sns.regplot(x, y)

enter image description here

Upvotes: 1

Related Questions