Reputation: 1033
I have a pandas column which I have initialized with ones, this column represents the health of a solar panel.
I need to decay this value linearly unless the time has occurred where the panel will be replaced, here the value resets to 1 (hence why I have initialized to ones). What I am doing is looping through the column, then updating the current value with the value of the previous value, minus a constant.
This operation is extremely expensive (and I have over 200,000 samples). I was hoping someone might be able to help me with a vectorized solution, where I can avoid this for loop. Here is my code:
def set_degredation_factors_pv(df):
for i in df.index:
if i != replacement_duration_PV_year * hour_per_year and i != 0:
df.loc[i, 'degradation_factor_PV_power_frac'] = df.loc[i-1, 'degradation_factor_PV_power_frac'] - degradation_rate_PV_power_perc_per_hour/100
return df
Variables:
replacement_duration_PV_year = 25
hour_per_year = 8760
degradation_rate_PV_power_perc_per_hour = 5.479e-5
Input data:
time_datetime degradation_factor_PV_power_frac
0 2022-01-01 00:00:00 1
1 2022-01-01 01:00:00 1
2 2022-01-01 02:00:00 1
3 2022-01-01 03:00:00 1
4 2022-01-01 04:00:00 1
... ... ...
8732 2022-12-30 20:00:00 1
8733 2022-12-30 21:00:00 1
8734 2022-12-30 22:00:00 1
8735 2022-12-30 23:00:00 1
8736 2022-12-31 00:00:00 1
Output data (only taking one year for time):
time_datetime degradation_factor_PV_power_frac
0 2022-01-01 00:00:00 1.000000
1 2022-01-01 01:00:00 0.999999
2 2022-01-01 02:00:00 0.999999
3 2022-01-01 03:00:00 0.999998
4 2022-01-01 04:00:00 0.999998
... ... ...
8732 2022-12-30 20:00:00 0.995216
8733 2022-12-30 21:00:00 0.995215
8734 2022-12-30 22:00:00 0.995215
8735 2022-12-30 23:00:00 0.995214
8736 2022-12-31 00:00:00 0.995214
Upvotes: 1
Views: 55
Reputation: 120489
Try:
rate = degradation_rate_PV_power_perc_per_hour / 100
mask = ~((df.index != replacement_duration_PV_year * hour_per_year)
& (df.index != 0))
df['degradation_factor_PV_power_frac'] = (
df.groupby(mask.cumsum())['degradation_factor_PV_power_frac']
.apply(lambda x: x.shift().sub(rate).cumprod())
.fillna(df['degradation_factor_PV_power_frac'])
)
Output:
>>> df
time_datetime degradation_factor_PV_power_frac
0 2022-01-01 00:00:00 1.000000
1 2022-01-01 01:00:00 0.999999
2 2022-01-01 02:00:00 0.999999
3 2022-01-01 03:00:00 0.999998
4 2022-01-01 04:00:00 0.999998
Upvotes: 1