dloudtrain
dloudtrain

Reputation: 301

Subset and group for specific cases in R

I have a dataframe [df] like this:

id   device   date                speed    incident
1    B3       2020-04-15 08:00    23          0
2    B3       2020-04-15 09:00    21          0
3    B3       2020-04-15 10:00    54          1
4    B3       2020-04-15 11:00    52          1
5    B3       2020-04-15 12:00    24          0
6    B3       2020-04-15 13:00    12          0
7    B3       2020-04-16 09:00    51          1
8    B3       2020-04-16 10:00    16          0
9    B3       2020-04-16 11:00    20          0
10   B3       2020-04-16 12:00    21          0
11   B3       2020-04-16 13:00    19          0

I would like to know whether there is a way of subset the data so that only rows where incident = 1 remain with the one before and after the incident and assign an id for each incident group.

The preferred result would look like this:

id   device   date                pressure    warning   group
2    B3       2020-04-15 09:00    21          0           1
3    B3       2020-04-15 10:00    54          1           1
4    B3       2020-04-15 11:00    52          1           1
5    B3       2020-04-15 12:00    24          0           1 
6    B3       2020-04-15 13:00    12          0           2
7    B3       2020-04-16 09:00    51          1           2
8    B3       2020-04-16 10:00    16          0           2

Thank you very much for any suggestions.

Upvotes: 0

Views: 42

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389275

Here is one base R approach :

#Get row numbers where incident = 1
ones <- which(df$incident == 1)
#Create groups of consecutive ones
inds <- split(ones,cumsum(c(TRUE, diff(ones) > 1)))

#subset the dataframe by taking -1, +1 of inds in each list
#Create a group column and combine the data in one dataframe
do.call(rbind, Map(function(x, y) 
               transform(df[c(min(x) - 1, x, max(x) + 1), ], group = y), 
               inds, names(inds)))

#    id device            date speed incident group
#1.2  2     B3 2020-04-1509:00    21        0     1
#1.3  3     B3 2020-04-1510:00    54        1     1
#1.4  4     B3 2020-04-1511:00    52        1     1
#1.5  5     B3 2020-04-1512:00    24        0     1
#2.6  6     B3 2020-04-1513:00    12        0     2
#2.7  7     B3 2020-04-1609:00    51        1     2
#2.8  8     B3 2020-04-1610:00    16        0     2

Upvotes: 1

Related Questions