Reputation: 203
How can I create a new pandas timeseries dataframe from one existing df.
Say event A started 11/28 11:35 and ended 11/29 19:53 which is count 1. Again event A 2nd instance started 11/28 11:37 and ended 11/29 19:53 - counts another 1. So I increased value of A to 2. (Sorry data entry was mistakenly 11/28 instead it would be 11/29)
Source df given with start and end time of an event. And same event can happen multiple times at the same time. New df should have a time series of cumulative count of event on a given minute ranging from Min(Start Time) to Max(End Time).
Source Df:
Start-Time | End-Time | Event
11/28/2014 11:35 | 11/29/2014 19:53 | A
11/28/2014 11:36 | 11/28/2014 11:37 | B
11/28/2014 11:32 | 11/28/2014 19:53 | C
11/28/2014 11:37 | 11/28/2014 19:53 | A
......
New Df:
TimeStamp | A | B | C
11/28/2014 11:35 | 1 | 0 | 1
11/28/2014 11:36 | 1 | 1 | 1
11/28/2014 11:37 | 2 | 1 | 1
.....
11/29/2014 19:53 | 2 | 0 | 1
Upvotes: 4
Views: 268
Reputation: 5064
Here is a slightly different approach than @DSM's. I stack the start
and end
columns on top of each other and then filter with a groupby
and an aggregate
function on the length
. Then in order to achieve the desired looking output I pivot
the table.
start = [35, 36, 37, 36, 35]
end = [56, 56, 56, 58, 58]
events = ['A', 'B', 'C', 'A', 'A']
df = pd.DataFrame( {'start': start, 'end': end, 'events': events})
# stack the 'start' and 'end' columns here
new_df = pd.DataFrame({ 'times': df['start'].append(df['end']), 'events': df['events'].append(df['events']) })
new_df = new_df.groupby(['times', 'events']).agg(len)
# massage the data frame to conform to desired output
new_df = new_df.reset_index().pivot('times', 'events').fillna(0)
The concatenated data frame looks like:
events times
0 A 35
1 B 36
2 C 37
3 A 36
4 A 35
0 A 56
1 B 56
2 C 56
3 A 58
4 A 58
The data frame after the groupby
grouping:
times events
35 A 2
36 A 1
B 1
37 C 1
56 A 1
B 1
C 1
58 A 2
And finally the data frame after the pivot:
events A B C
times
35 2 0 0
36 1 1 0
37 0 0 1
56 1 1 1
58 2 0 0
I imagine that @DSM's solution is more efficient than mine in regards to computational time since the append
method is rather costly because it requires construction of an entirely new object upon each call. I haven't timed either method though so I don't know for sure.
Upvotes: 1
Reputation: 353009
This is a little tricky because you want the end time to count as an "on" state, but I think something like this should work (warning: I've spent exactly zero time considering strange edge cases, so buyer beware):
df = pd.melt(df, id_vars="Event", var_name="Which", value_name="Time")
df["Signal"] = df.pop("Which").replace({"Start-Time": 1, "End-Time": -1})
pivoted = df.pivot(columns="Event", index="Time").fillna(0)
pivoted = pivoted.sort_index() # just in case; can't remember if this is guaranteed
df_out = pivoted.cumsum() + (pivoted == -1)
which produces
>>> df_out
Signal
Event A B C
Time
11/28/2014 11:32 0 0 1
11/28/2014 11:35 1 0 1
11/28/2014 11:36 1 1 1
11/28/2014 11:37 2 1 1
11/28/2014 19:53 2 0 1
11/29/2014 19:53 1 0 0
The basic idea is to add a signed "Signal" column and use that to track the changes:
>>> df
Event Time Signal
0 A 11/28/2014 11:35 1
1 B 11/28/2014 11:36 1
2 C 11/28/2014 11:32 1
3 A 11/28/2014 11:37 1
4 A 11/29/2014 19:53 -1
5 B 11/28/2014 11:37 -1
6 C 11/28/2014 19:53 -1
7 A 11/28/2014 19:53 -1
which we can then pivot to get the state changes:
>>> pivoted
Signal
Event A B C
Time
11/28/2014 11:32 0 0 1
11/28/2014 11:35 1 0 0
11/28/2014 11:36 0 1 0
11/28/2014 11:37 1 -1 0
11/28/2014 19:53 -1 0 -1
11/29/2014 19:53 -1 0 0
and accumulate to get the state:
>>> pivoted.cumsum()
Signal
Event A B C
Time
11/28/2014 11:32 0 0 1
11/28/2014 11:35 1 0 1
11/28/2014 11:36 1 1 1
11/28/2014 11:37 2 0 1
11/28/2014 19:53 1 0 0
11/29/2014 19:53 0 0 0
This is almost what we want, but you want the end time to be included, and so we can lag the effects by undoing the shutoff:
>>> pivoted.cumsum() + (pivoted == -1)
Signal
Event A B C
Time
11/28/2014 11:32 0 0 1
11/28/2014 11:35 1 0 1
11/28/2014 11:36 1 1 1
11/28/2014 11:37 2 1 1
11/28/2014 19:53 2 0 1
11/29/2014 19:53 1 0 0
Upvotes: 3