Reputation: 51
I would like to do below in python using dataframe and custom apply function.
Month,1,2,3,4,5
Oct 2018,0.1,0.2,0.3,0.4,NaN
Nov 2018,0.5,1.0,1.5,NaN,NaN
First row is column headers. I would like to take each row and do a linear fit and populate slope and intercept into a separate column. For example, "Oct 2018" row has x:[1,2,3,4],y:[0.1,0.2,0.3,0.4]
should give slope=0.1
and intercept=0.0
whereas "Nov 2018" row has x:[1,2,3], y:[0.5,1.0,1.5]
should give slope=0.5
and intercept=0.0
.
I am getting x values from column names. Thanks in advance.
Upvotes: 0
Views: 47
Reputation: 16683
You can get a new Slope column by using .stack
and doing a .groupby
of the month with a key thing being doing a .sum()
, doing a .rename
of the columns and using .eval
to calculate the slope. Finally, send the values to a list in order to set it back to the newly created 'Slope' column. For an intercept column you can just set the intercept is zero.
Data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Month':['Oct', 'Nov'],
1: [0.1,0.5],
2: [0.2,1.0],
3: [0.3,1.5],
4: [0.4,np.NaN],
5: [np.NaN,np.NaN]})
Code:
df['Slope'] = pd.DataFrame(df.set_index('Month').stack()).reset_index().groupby('Month').sum().rename(columns={'level_1' : 'x', 0 : 'y'}).eval('Slope = y / x')['Slope'].to_list()
Output:
df
Month 1 2 3 4 5 Slope
0 Oct 0.1 0.2 0.3 0.4 NaN 0.5
1 Nov 0.5 1.0 1.5 NaN NaN 0.1
Upvotes: 0
Reputation: 155
def get_slope_interscept(x):
# Need two valid points to determine slope and interscept
first_index = x[1:,].first_valid_index()
second_index = x[first_index+1:,].first_valid_index()
x1, x2, y1, y2 = first_index, second_index, x[first_index], x[second_index]
slope = (y2-y1)/ (x2-x1)
interscept = y2 - slope*x2
return [slope, interscept]
df.apply(lambda x: pd.Series(get_slope_interscept(x), index=['slope', 'interscept']), axis=1)
This will also handle the case if there are invalid NaNs ahead in the row.
Upvotes: 1