Windstorm1981
Windstorm1981

Reputation: 2680

Identify Updated Value in Time Series Data Python Pandas

I don't do a lot of time series work and I know the way I'm thinking about this solution is sub-optimal. Wanted to get input as to the most efficient way to approach this issue.

I have several days of values with multiple values per day identified by a time stamp.

Data looks like this:

Index      Period     Value         Timestamp
  0          1          73          2017-08-10 16:44:23
  1          1          73          2017-08-09 16:30:12
  2          1          73          2017-08-08 16:40:31
  3          2          50          2017-08-10 16:44:23
  4          2          45          2017-08-09 16:30:12
  5          2          45          2017-08-08 16:40:31
  6          3          13          2017-08-10 16:44:23
  7          3          13          2017-08-09 16:30:12
  8          3          13          2017-08-08 16:40:31

The example shows one data element for three different periods captured three days in a row. The idea is determining if the value for any of the measured period (Period 1, 2, or 3) changes.

As you can see in the example, on the third day (2017-08-10) The value for Period 2 was updated. I want to detect that changed value.

The only way I can figure out how do compare is to loop through which I think is both inelegant, inefficient and definitely not Pythonic.

Anyone have insight into a means of approach without looping/iteration?

Thanks in advance.

EDIT

Expected Output would be a df as follows if there is a value change in the most recent timestamped data:

Index      Period     Value         Timestamp
  0          1          73          2017-08-10 16:44:23
  3          2          50          2017-08-10 16:44:23
  6          3          13          2017-08-10 16:44:23

Upvotes: 1

Views: 1030

Answers (1)

JohnE
JohnE

Reputation: 30424

First, you can identify rows with a change like this:

df['diff'] = df.groupby('Period')['Value'].diff(-1).fillna(0)

   Period  Value            Timestamp  diff
0       1     73  2017-08-10 16:44:23   0.0
1       1     73  2017-08-09 16:30:12   0.0
2       1     73  2017-08-08 16:40:31   0.0
3       2     50  2017-08-10 16:44:23   5.0
4       2     45  2017-08-09 16:30:12   0.0
5       2     45  2017-08-08 16:40:31   0.0
6       3     13  2017-08-10 16:44:23   0.0
7       3     13  2017-08-09 16:30:12   0.0
8       3     13  2017-08-08 16:40:31   0.0

Then select rows to display (all rows with the same timestamp as a row with a change):

lst = df[ df['diff'] != 0. ]['Timestamp'].tolist()
df[ df['Timestamp'].isin(lst) ]

   Period  Value            Timestamp  diff
0       1     73  2017-08-10 16:44:23   0.0
3       2     50  2017-08-10 16:44:23   5.0
6       3     13  2017-08-10 16:44:23   0.0

Upvotes: 3

Related Questions