Reputation: 151
I have a dataset with id and speed.
id <- c(1,1,1,1,2,2,2,2,3,3,3)
speed <- c(40,30,50,40,45,50,30,55,50,50,60)
i <- cbind(id, speed)
limit <- 35
Say, if 'speed' crosses 'limit' will count it as 1. And you will count again only if speed comes below and crosses the 'limit'.
I want data to be like.
id | Speed Viol.
----------
1 | 2
---------
2 | 2
---------
3 | 1
---------
here id (count).
id1 (1) 40 (2) 50,40
id2 (1) 45,50 (2) 55
id3 (1) 50,50,60
How to do it not using if()
.
Upvotes: 1
Views: 811
Reputation: 32548
aggregate(speed~id, data.frame(i), function(x) sum(rle(x>limit)$values))
# id speed
#1 1 2
#2 2 2
#3 3 1
The main idea is that x > limit
will check for instances when the speed limit is violated and rle(x)
will group those instances into consecutive violations or consecutive non-violations. Then all you need to do is to count the groups of consecutive violations (when rle(x>limit)$values
is TRUE
).
Upvotes: 0
Reputation: 38510
Here's a method tapply
as suggested in the comments and the original vectors.
tapply(speed, id, FUN=function(x) sum(c(x[1] > limit, diff(x > limit)) > 0))
1 2 3
2 2 1
tapply
applies a function to each group, here, by ID. The function checks if the first element of an ID is over 35, and then concatenates this to the output of diff
, whose argument is checking if subsequent observations are greater than 35. Thus diff
checks if an ID returns to above 35 after dropping below that level. Negative values in the resulting vector are converted to FALSE (0) with > 0
and these results are summed.
tapply
returns a named vector, which can be fairly nice to work with. However, if you want a data.frame, then you could use aggregate
instead as suggested by d.b:
aggregate(speed, list(id=id), FUN=function(x) sum(c(x[1] > limit, diff(x > limit)) > 0))
id x
1 1 2
2 2 2
3 3 1
Upvotes: 5
Reputation: 12074
Here's a dplyr
solution. I group by id
then check if speed is above the limit in each row, but wasn't in the previous entry. (I get the previous row using lag
). If this is the case, it produces TRUE
. Or, if it's the first row for the id
(i.e., row_number()==1
) and it's above the limit, this gives a TRUE
, too. Then, I sum all the TRUE
values for each id
using summarise
.
id <- c(1,1,1,1,2,2,2,2,3,3,3)
speed <- c(40,30,50,40,45,50,30,55,50,50,60)
i <- data.frame(id, speed)
limit <- 35
library(dplyr)
i %>%
group_by(id) %>%
mutate(viol=(speed>limit&lag(speed)<limit)|(row_number()==1&speed>limit)) %>%
summarise(sum(viol))
# A tibble: 3 x 2
id `sum(viol)`
<dbl> <int>
1 1 2
2 2 2
3 3 1
Upvotes: 2
Reputation: 51592
Here is another option with data.table
,
library(data.table)
setDT(i)[, id1 := rleid(speed > limit), by = id][
speed > limit, .(violations = uniqueN(id1)), by = id][]
which gives,
id violations 1: 1 2 2: 2 2 3: 3 1
Upvotes: 1