user12928769
user12928769

Reputation:

Filtering for two identical consecutive entries in a column

Imagine a snippet of the follow data frame:

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376           
5       1 Schedule Activities  600.000       Type_1       15           
6       1 Schedule Activities 2062.295       Type_1       15  

What I'm trying to do is to create two new columns based on the repeating entries in ActivityName.

Specifically, I want to combine two subsequent rows for the same activity into one row with a start and complete timestamp (from Time, in seconds.)

Given that not all entries in ActivityName have a matching second entry (however, max two consecutive entries are identical), I would also like to delete such "single-standing" rows.

P.s. Although not seen in the data frame snippet, all levels of ActivityName occur repeatedly, whether in consecutive-identical or in single fashion.

Any ideas for how to go about this will be highly appreciated.

Upvotes: 0

Views: 45

Answers (1)

Supposing that ID is a variable that indicates which entries from ActivityName should be grouped, this should work:

library(tidyverse)

df %>%
  #Group by ID and ActivityName
  group_by(ID, ActivityName) %>%
  #Stay only with entries with more than 1 row
  filter(n() > 1) %>%
  #Put the min value of Time as Start and the max value as Timestamp 
  summarize (Start = min(Time),
             Timestamp = max(Time))

Upvotes: 2

Related Questions