bigbro
bigbro

Reputation: 203

Create new pandas timeseries dataframe from other dataframe

How can I create a new pandas timeseries dataframe from one existing df.

Say event A started 11/28 11:35 and ended 11/29 19:53 which is count 1. Again event A 2nd instance started 11/28 11:37 and ended 11/29 19:53 - counts another 1. So I increased value of A to 2. (Sorry data entry was mistakenly 11/28 instead it would be 11/29)

Source df given with start and end time of an event. And same event can happen multiple times at the same time. New df should have a time series of cumulative count of event on a given minute ranging from Min(Start Time) to Max(End Time).

Source Df:

Start-Time       |  End-Time         | Event
11/28/2014 11:35 |  11/29/2014 19:53 | A
11/28/2014 11:36 |  11/28/2014 11:37 | B
11/28/2014 11:32 |  11/28/2014 19:53 | C
11/28/2014 11:37 |  11/28/2014 19:53 | A
......

New Df:

TimeStamp        | A |  B | C
11/28/2014 11:35 | 1 |  0 | 1
11/28/2014 11:36 | 1 |  1 | 1
11/28/2014 11:37 | 2 |  1 | 1
.....
11/29/2014 19:53 | 2 |  0 | 1

Upvotes: 4

Views: 268

Answers (2)

alacy
alacy

Reputation: 5064

Here is a slightly different approach than @DSM's. I stack the start and end columns on top of each other and then filter with a groupby and an aggregate function on the length. Then in order to achieve the desired looking output I pivot the table.

start = [35, 36, 37, 36, 35]
end = [56, 56, 56, 58, 58]
events = ['A', 'B', 'C', 'A', 'A']

df = pd.DataFrame( {'start': start, 'end': end, 'events': events})

# stack the 'start' and 'end' columns here
new_df = pd.DataFrame({ 'times': df['start'].append(df['end']), 'events': df['events'].append(df['events']) })

new_df = new_df.groupby(['times', 'events']).agg(len)

# massage the data frame to conform to desired output
new_df = new_df.reset_index().pivot('times', 'events').fillna(0)

The concatenated data frame looks like:

  events  times
0      A     35
1      B     36
2      C     37
3      A     36
4      A     35
0      A     56
1      B     56
2      C     56
3      A     58
4      A     58

The data frame after the groupby grouping:

times  events
35     A         2
36     A         1
       B         1
37     C         1
56     A         1
       B         1
       C         1
58     A         2

And finally the data frame after the pivot:

events  A  B  C
times          
35      2  0  0
36      1  1  0
37      0  0  1
56      1  1  1
58      2  0  0

I imagine that @DSM's solution is more efficient than mine in regards to computational time since the append method is rather costly because it requires construction of an entirely new object upon each call. I haven't timed either method though so I don't know for sure.

Upvotes: 1

DSM
DSM

Reputation: 353009

This is a little tricky because you want the end time to count as an "on" state, but I think something like this should work (warning: I've spent exactly zero time considering strange edge cases, so buyer beware):

df = pd.melt(df, id_vars="Event", var_name="Which", value_name="Time")
df["Signal"] = df.pop("Which").replace({"Start-Time": 1, "End-Time": -1})
pivoted = df.pivot(columns="Event", index="Time").fillna(0)
pivoted = pivoted.sort_index() # just in case; can't remember if this is guaranteed
df_out = pivoted.cumsum() + (pivoted == -1)

which produces

>>> df_out
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  1  1
11/28/2014 19:53      2  0  1
11/29/2014 19:53      1  0  0

The basic idea is to add a signed "Signal" column and use that to track the changes:

>>> df
  Event              Time  Signal
0     A  11/28/2014 11:35       1
1     B  11/28/2014 11:36       1
2     C  11/28/2014 11:32       1
3     A  11/28/2014 11:37       1
4     A  11/29/2014 19:53      -1
5     B  11/28/2014 11:37      -1
6     C  11/28/2014 19:53      -1
7     A  11/28/2014 19:53      -1

which we can then pivot to get the state changes:

>>> pivoted
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  0
11/28/2014 11:36      0  1  0
11/28/2014 11:37      1 -1  0
11/28/2014 19:53     -1  0 -1
11/29/2014 19:53     -1  0  0

and accumulate to get the state:

>>> pivoted.cumsum()
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  0  1
11/28/2014 19:53      1  0  0
11/29/2014 19:53      0  0  0

This is almost what we want, but you want the end time to be included, and so we can lag the effects by undoing the shutoff:

>>> pivoted.cumsum() + (pivoted == -1)
                 Signal      
Event                 A  B  C
Time                         
11/28/2014 11:32      0  0  1
11/28/2014 11:35      1  0  1
11/28/2014 11:36      1  1  1
11/28/2014 11:37      2  1  1
11/28/2014 19:53      2  0  1
11/29/2014 19:53      1  0  0

Upvotes: 3

Related Questions