Reputation: 337
Applying regression on each of the columns or rows in a pandas dataframe, without using for loops.
There is a similar post about this; Apply formula across pandas rows/ regression line, that does a regression for each of the "rows," however plotting the answer given is wrong. I couldn't comment on it as i do not have enough reputation, the main problem with that is that, it takes the values of the columns but then uses the apply function on each row.
Currently I only know how to do each column eg.
np.random.seed(1997)
df = pd.DataFrame(np.random.randn(10, 4))
first_stats = scipy.stats.linregress(df.index,df[0])
second_stats = scipy.stats.linregress(df.index,df[1])
I was hoping to find an answer without creating a function or for loops, similar to; pandas df.sum(), but instead of sum i want to do a regression that results in slope, intercept, r-value, p-value and standard error.
Upvotes: 4
Views: 3532
Reputation: 1672
Look at the following example:
import numpy as np
import pandas as pd
from scipy.stats import linregress
np.random.seed(1997)
df = pd.DataFrame(pd.np.random.rand(100, 10))
df.apply(lambda x: linregress(df.index, x), result_type='expand').rename(index={0: 'slope', 1:
'intercept', 2: 'rvalue', 3:
'p-value', 4:'stderr'})
It should return what you want.
Upvotes: 5