Reputation: 9
I have two years worth of sensor data in a pandas dataframe. The index is a time series. Looks like this:
temp1 overtemp time_to_overtemp
datetime
2019-01-02 09:31:00 305.96
2019-01-02 09:32:00 305.98
2019-01-02 09:33:00 305.70
2019-01-02 09:34:00 305.30
2019-01-02 09:35:00 306.88
What I want to do is loop over the time series to populate the "overtemp" and "time_to_overtemp columns. "overtemp" needs to be assigned a 1 if the temperature data anytime in the next two weeks increases by more than 2%. "time_to_overtemp" needs to show the time of the next >2% reading, if it exists in the next two weeks. If the temperature says within 2% for the next two weeks, both columns should be assigned a 0.
For example 2019-01-02 09:31:00 should look at the next two weeks worth of temperature data and put a 0 in both columns because all data in that time period is within 2% of the value. The overtemp value for 2020-01-02 09:35:00 should be a 1 because the value increased by 5% a week later. The time_to_overtemp value should indicate 7 days, 2 hours, 38 minutes because thats when the overtemp occured.
I am successfully doing some more math stuff using iterrows:
for datetime, row in df.iterrows():
but its taking forever. And I am not getting how to do the time iterations and calculations at all yet.
I have done other labeling with:
df['overtemp'] = np.select([df['temp1']<305, df['temp1']>305], [1,0])
I guess this vectorizes the process? It sure works a lot faster than iterating. But I can't figure out how to implement the datetime+two week portion.
Upvotes: 1
Views: 150
Reputation: 3739
Think you could do it with pandas rolling by applying funcs, here is my go, hope this is what you need:
# create random data
ind = pd.date_range(start = pd.to_datetime("2019-01-02 09:31:00"), periods=28*24*60, freq='min')
v = [random.randint(30000, 32000)/100 for x in ind]
df = pd.DataFrame(v, index=ind, columns = ['temp1'])
# define funcs
def overtemp(S):
l = S[S>=S[-1]*1.02]
if len(l)>0:
return l[-1]
else:
return 0
def overtemp_seconds(S):
l = np.argwhere(S.values>=S.values[-1]*1.02)
if len(l)>0:
i = l[-1][0]
delta = S.index[i] - S.index[-1]
return delta.seconds
else:
return 0
# apply funcs to time window (reversed cause rolling gives you time window till offset)
over_temp = df[::-1].rolling('14D').apply(overtemp)[::-1]['temp1']
seconds_to_over_temp = df[::-1].rolling('14D').apply(overtemp_seconds)[::-1]['temp1']
# add to orig df
df['over_temp'] = over_temp
df['seconds_to_over_temp'] = seconds_to_over_temp
Upvotes: 0
Reputation: 571
Here is a suggestion on how to approach your problem by using apply and indexing.
But it is not a very fast solution indeed, maybe pandas have a better function to do this.
Or you can parallelize the apply function to make it faster.
df = pd.DataFrame(pd.date_range(start='2020-01-01', end='2020-03-01', freq='H'), columns=['datetime'])
df['temp'] = [np.random.uniform(low=300, high=310) for _ in range(df.shape[0])]
def get_overtemp(row):
now = row.datetime
next_14_days = now + timedelta(days=14)
temp = row.temp
filtered = df[
(df['datetime'] > now) &
(df['datetime'] <= next_14_days) &
(df['temp'] > temp * 1.02)
]
overtemp = len(filtered) > 0
time_to_overtemp = None
if overtemp:
time_to_overtemp = filtered['datetime'].values[0] - now
return pd.Series([overtemp, time_to_overtemp])
df[['overtemp', 'time_to_overtemp']] = df.apply(get_overtemp, axis=1)
df.head(20)
Result:
| | datetime | temp | overtemp | time_to_overtemp |
|---:|:--------------------|--------:|:-----------|:-------------------|
| 0 | 2020-01-01 00:00:00 | 309.502 | False | NaT |
| 1 | 2020-01-01 01:00:00 | 303.816 | True | 7 days 00:00:00 |
| 2 | 2020-01-01 02:00:00 | 307.297 | False | NaT |
| 3 | 2020-01-01 03:00:00 | 306.252 | False | NaT |
| 4 | 2020-01-01 04:00:00 | 303.458 | True | 0 days 07:00:00 |
| 5 | 2020-01-01 05:00:00 | 304.27 | False | NaT |
| 6 | 2020-01-01 06:00:00 | 300.98 | True | 0 days 05:00:00 |
| 7 | 2020-01-01 07:00:00 | 306.652 | False | NaT |
| 8 | 2020-01-01 08:00:00 | 304.107 | False | NaT |
| 9 | 2020-01-01 09:00:00 | 300.077 | True | 0 days 02:00:00 |
See: https://github.com/nalepae/pandarallel
Upvotes: 1