user2340760
user2340760

Reputation: 51

Python dataframe custom apply function

I would like to do below in python using dataframe and custom apply function.

Month,1,2,3,4,5

Oct 2018,0.1,0.2,0.3,0.4,NaN

Nov 2018,0.5,1.0,1.5,NaN,NaN

First row is column headers. I would like to take each row and do a linear fit and populate slope and intercept into a separate column. For example, "Oct 2018" row has x:[1,2,3,4],y:[0.1,0.2,0.3,0.4] should give slope=0.1 and intercept=0.0 whereas "Nov 2018" row has x:[1,2,3], y:[0.5,1.0,1.5] should give slope=0.5 and intercept=0.0.

I am getting x values from column names. Thanks in advance.

Upvotes: 0

Views: 47

Answers (2)

David Erickson
David Erickson

Reputation: 16683

You can get a new Slope column by using .stack and doing a .groupbyof the month with a key thing being doing a .sum(), doing a .rename of the columns and using .eval to calculate the slope. Finally, send the values to a list in order to set it back to the newly created 'Slope' column. For an intercept column you can just set the intercept is zero.

Data:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Month':['Oct', 'Nov'],
                   1: [0.1,0.5],
                   2: [0.2,1.0],
                   3: [0.3,1.5],
                   4: [0.4,np.NaN],
                   5: [np.NaN,np.NaN]})

Code:

df['Slope'] = pd.DataFrame(df.set_index('Month').stack()).reset_index().groupby('Month').sum().rename(columns={'level_1' : 'x', 0 : 'y'}).eval('Slope = y / x')['Slope'].to_list()

Output:

df

    Month   1   2   3   4   5   Slope
0   Oct     0.1 0.2 0.3 0.4 NaN 0.5
1   Nov     0.5 1.0 1.5 NaN NaN 0.1

Upvotes: 0

utsavan
utsavan

Reputation: 155

def get_slope_interscept(x):
    # Need two valid points to determine slope and interscept
    first_index = x[1:,].first_valid_index()
    second_index = x[first_index+1:,].first_valid_index()
    x1, x2, y1, y2 = first_index, second_index, x[first_index], x[second_index]
    slope = (y2-y1)/ (x2-x1)
    interscept = y2 - slope*x2
    return [slope, interscept]

df.apply(lambda x: pd.Series(get_slope_interscept(x), index=['slope', 'interscept']), axis=1)

This will also handle the case if there are invalid NaNs ahead in the row.

Upvotes: 1

Related Questions