Reputation: 23
I'm transitioning from R to python, and was looking to plot the mean line of a two variables. It is the plot of the x variable is split into intervals for the x axis, and mean of the y variable for the y axis.
For example, if I have 1000 points (x1,y1) to (x1000, y1000), and want to plot into 3 bins, I would have 3 bars of x intervals where each would have the mean of the y variables that fall into that respective interval.
Does anyone know what this plot is called, and how I can do this in python? In R, I use the "cut" command and then plot the x,y of the cut.
Thanks!
Upvotes: 2
Views: 1632
Reputation: 24742
For the follow-up questions, we can do something more powerful using boxplot.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
x = np.random.randn(1000,)
y = 5 * x ** 2 + np.random.randn(1000,)
data = pd.DataFrame(0.0, columns=['X', 'Y'], index=np.arange(1000))
data.X = x
data.Y = y
# now do your stuff
# ================================
# use the pandas 'cut' function
data['X_bins'] = pd.cut(data.X, 3)
data.set_index('X_bins', append=True, inplace=True)
data.drop('X', axis=1, inplace=True)
data.unstack(level=1).boxplot()
Upvotes: 2
Reputation: 24742
Here is an example to do it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
x = np.random.randn(1000,)
y = 5 * x ** 2 + np.random.randn(1000,)
data = pd.DataFrame(0.0, columns=['X', 'Y'], index=np.arange(1000))
data.X = x
data.Y = y
# now do your stuff
# ================================
# use the pandas 'cut' function
data['X_bins'] = pd.cut(data.X, 3)
# for each bin, calculate the mean of Y
result = data.groupby('X_bins')['Y'].mean()
# do the plot
result.plot()
Upvotes: 1