Create new pandas timeseries dataframe from other dataframe

Question

How can I create a new pandas timeseries dataframe from one existing df.

Say event A started 11/28 11:35 and ended 11/29 19:53 which is count 1. Again event A 2nd instance started 11/28 11:37 and ended 11/29 19:53 - counts another 1. So I increased value of A to 2. (Sorry data entry was mistakenly 11/28 instead it would be 11/29)

Source df given with start and end time of an event. And same event can happen multiple times at the same time. New df should have a time series of cumulative count of event on a given minute ranging from Min(Start Time) to Max(End Time).

Source Df:

Start-Time       |  End-Time         | Event
11/28/2014 11:35 |  11/29/2014 19:53 | A
11/28/2014 11:36 |  11/28/2014 11:37 | B
11/28/2014 11:32 |  11/28/2014 19:53 | C
11/28/2014 11:37 |  11/28/2014 19:53 | A
......

New Df:

TimeStamp        | A |  B | C
11/28/2014 11:35 | 1 |  0 | 1
11/28/2014 11:36 | 1 |  1 | 1
11/28/2014 11:37 | 2 |  1 | 1
.....
11/29/2014 19:53 | 2 |  0 | 1

DSM · Accepted Answer

This is a little tricky because you want the end time to count as an "on" state, but I think something like this should work (warning: I've spent exactly zero time considering strange edge cases, so buyer beware):

df = pd.melt(df, id_vars="Event", var_name="Which", value_name="Time")
df["Signal"] = df.pop("Which").replace({"Start-Time": 1, "End-Time": -1})
pivoted = df.pivot(columns="Event", index="Time").fillna(0)
pivoted = pivoted.sort_index() # just in case; can't remember if this is guaranteed
df_out = pivoted.cumsum() + (pivoted == -1)

which produces

>>> df_out
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  1  1
11/28/2014 19:53      2  0  1
11/29/2014 19:53      1  0  0

The basic idea is to add a signed "Signal" column and use that to track the changes:

>>> df
  Event              Time  Signal
0     A  11/28/2014 11:35       1
1     B  11/28/2014 11:36       1
2     C  11/28/2014 11:32       1
3     A  11/28/2014 11:37       1
4     A  11/29/2014 19:53      -1
5     B  11/28/2014 11:37      -1
6     C  11/28/2014 19:53      -1
7     A  11/28/2014 19:53      -1

which we can then pivot to get the state changes:

>>> pivoted
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  0
11/28/2014 11:36      0  1  0
11/28/2014 11:37      1 -1  0
11/28/2014 19:53     -1  0 -1
11/29/2014 19:53     -1  0  0

and accumulate to get the state:

>>> pivoted.cumsum()
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  0  1
11/28/2014 19:53      1  0  0
11/29/2014 19:53      0  0  0

This is almost what we want, but you want the end time to be included, and so we can lag the effects by undoing the shutoff:

>>> pivoted.cumsum() + (pivoted == -1)
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  1  1
11/28/2014 19:53      2  0  1
11/29/2014 19:53      1  0  0

Create new pandas timeseries dataframe from other dataframe

Answers (2)

Related Questions