Reputation:
Imagine a snippet of the follow data frame:
ID ActivityName Time Type Shape
1 1 Request 0.000 Type_1 767
2 1 Request 600.000 Type_1 767
3 1 Start 600.000 Type_1 1376
4 1 Start 600.000 Type_1 1376
5 1 Schedule Activities 600.000 Type_1 15
6 1 Schedule Activities 2062.295 Type_1 15
What I'm trying to do is to create two new columns based on the repeating entries in ActivityName
.
Specifically, I want to combine two subsequent rows for the same activity into one row with a start and
complete timestamp (from Time
, in seconds.)
Given that not all entries in ActivityName
have a matching second entry (however, max two consecutive entries are identical), I would also like to delete such "single-standing" rows.
P.s. Although not seen in the data frame snippet, all levels of ActivityName occur repeatedly, whether in consecutive-identical or in single fashion.
Any ideas for how to go about this will be highly appreciated.
Upvotes: 0
Views: 45
Reputation: 5620
Supposing that ID
is a variable that indicates which entries from ActivityName
should be grouped, this should work:
library(tidyverse)
df %>%
#Group by ID and ActivityName
group_by(ID, ActivityName) %>%
#Stay only with entries with more than 1 row
filter(n() > 1) %>%
#Put the min value of Time as Start and the max value as Timestamp
summarize (Start = min(Time),
Timestamp = max(Time))
Upvotes: 2