Reputation: 369
I want to compare the first value of x1 and x2 in the grouped (grouped by ID) dataset. If the first grouped value of x1 is greater than the first grouped value of x2, I will assign ID as 1 otherwise 0. Let me show you this in an example. You can see my input variable below
dt<-data.frame(ID=c(100, 100, 101, 101, 101), x1=c(1200, 1600, 1350, 1400, 1500),
x2=c(1100, 1410, 1900, 1300, 1100))
Since 1200 > 1100, I will assign 1 to ID 100 and since 1350 < 1900, I will assign 0 to ID 101. Finally, my output will be
res<-data.frame(ID=c(100, 101), res=c(1,0))
how can I do that?
Thanks
Upvotes: 0
Views: 80
Reputation: 887048
We can do
library(dplyr)
dt %>%
group_by(ID) %>%
summarise(res = +(first(x1) > first(x2)))
Upvotes: 1
Reputation: 21908
You can also use the following solution. I hope I got what you have in mind right:
library(dplyr)
dt %>%
group_by(ID) %>%
summarise(res = ifelse(first(x1) > first(x2), 1, 0))
# A tibble: 2 x 2
ID res
<dbl> <dbl>
1 100 1
2 101 0
Upvotes: 1
Reputation: 6206
You can group by using dplyr
and then access the first element of each group using [1]
and then compare them using an if_else
statement in summarise
dt %>%
dplyr::group_by(ID) %>%
dplyr::summarise(res = dplyr::if_else(x1[1] > x2[1], 1, 0))
Output:
# A tibble: 2 x 2
ID res
<dbl> <dbl>
1 100 1
2 101 0
For completeness here is a data.table
version and a benchmark.
dt[, .(z = ifelse(x1[1] > x2[1], 1, 0)), by=ID]
> dt = data.table(ID = rep(100:1000, each=1000), x1 = sample(901000), x2 = sample(901000))
>
>
> microbenchmark::microbenchmark(
... dplyr = dt %>%
... dplyr::group_by(ID) %>%
... dplyr::summarise(res = dplyr::if_else(x1[1] > x2[1], 1, 0)),
...
...
... data.table = dt[, .(z = ifelse(x1[1] > x2[1], 1, 0)), by=ID]
... )
Unit: milliseconds
expr min lq mean median uq max neval
dplyr 39.167330 42.806415 46.91723 44.422384 46.28869 125.31500 100
data.table 9.497764 9.844758 10.94920 9.930658 10.53419 22.87746 100
So if time is of the essence, then the data.table
version is ~4x faster.
Upvotes: 1