user2100039
user2100039

Reputation: 1356

Apply Function Panda Single Column & Groupby

I am trying to write and then apply a function that will multiply regression coefficients by the data value in a single column of a dataframe using 'groupby' on 2nd column in dataframe. My data looks like this in df:

  plant_name  year  month  wind_speed_ms
0  ARIZONA I  1991      1           6.99
1  ARIZONA I  1991      2           7.00
2  ARIZONA I  1991      3           6.41
3  ARIZONA I  1991      4           6.18
4  CAETITE II 1991      1           6.69
5  CAETITE II 1991      2           6.21

coeff is the panda below:

     plant_name    slopex  intercept
0     ARIZONA I   1.033109  -0.353939
1     CAETITE II  0.967414   0.194396

The results should look like this:

     plant_name   year   month   adjusted_wind_speed
0    ARIZONA I    1991   1       6.867
1    ARIZONA I    1991   2       6.878
2    ARIZONA I    1991   3       6.268
3    ARIZONA I    1991   4       6.031
4    CAETITE II   1991   1       6.666
5    CAETITE II   1991   2       6.202

My thinking is that this should be solvable by writing a function and applying it by using groupby on "plant_name but I'm getting this 'key error'

KeyError: 'slopex'

The code I'm trying:

def ADJUST(slopex, intercept):
    adj_df  = coeff.slopex * df["wind_speed_ms"] + coeff.intercept
    return adj_df

c_adj = (df.groupby(['plant_name','year','month'])[['wind_speed_ms']].apply(lambda x: 
ADJUST(x['slopex'],x['intercept']))).to_frame(name='c_adj').reset_index().round(2)

Upvotes: 0

Views: 58

Answers (1)

Mateusz Dorobek
Mateusz Dorobek

Reputation: 761

I've user this data in the following way. (I assume that you've meant CAETITE II instead of CAETITE I) in coefficients table.

df = pd.DataFrame(
    data=[
        ["ARIZONA I", 1991, 1, 6.99],
        ["ARIZONA I", 1991, 2, 7.00],
        ["ARIZONA I", 1991, 3, 6.41],
        ["ARIZONA I", 1991, 4, 6.18],
        ["CAETITE II",1991, 1, 6.69],
        ["CAETITE II",1991, 2, 6.21]
    ], 
    columns=['plant_name',  'year',  'month',  'wind_speed_ms'])
coeff_df = pd.DataFrame(
    data=[
        ['ARIZONA I', 1.033109, -0.353939],
        ['CAETITE II', 0.967414, 0.194396],
    ],  
    columns=['plant_name', 'slopex', 'intercept']
)

You just need simple merge:

df = df.merge(coeff_df, on='plant_name')
df['adjusted_wind_speed'] = df.wind_speed_ms * df.slopex + df.intercept
df
    plant_name  year    month   wind_speed_ms   slopex  intercept   adjusted_wind_speed
0   ARIZONA I   1991    1   6.99    1.033109    -0.353939   6.867493
1   ARIZONA I   1991    2   7.00    1.033109    -0.353939   6.877824
2   ARIZONA I   1991    3   6.41    1.033109    -0.353939   6.268290
3   ARIZONA I   1991    4   6.18    1.033109    -0.353939   6.030675
4   CAETITE II  1991    1   6.69    0.967414    0.194396    6.666396
5   CAETITE II  1991    2   6.21    0.967414    0.194396    6.202037

Upvotes: 1

Related Questions