Reputation: 711
I am cleaning some data in R and have a dataset like this:
x1, x2, x3
1, 24, 41
1, 22, 40
1, 21, 38
2, 20, 40
2, 21, 40
3, 22, 41
3, 24, 40
4, 20, 41
I want to add a new column, and the value of each row is based on both x1 and x2 column. Within each group in x1, I want to know if the value in x2 is greater than or equal to, say 24. If true, all the values in the new column for that group are set to 1.
So data should look like this:
x1, x2, x3, x4
1, 24, 41, 1
1, 22, 40, 1
1, 21, 38, 1
2, 20, 40, 0
2, 21, 40, 0
3, 22, 41, 1
3, 24, 40, 1
4, 20, 41, 0
The purpose of this is for aggregating the rows. I would like to aggregate the data based on groups in x1, but still need information on the other columns.
Upvotes: 2
Views: 7978
Reputation: 18681
Analogous to @akrun's answer, here is the data.table
equivalent:
library(data.table)
setDT(df)[, x4 := any(x2>=24)*1, by=x1]
Result:
x1 x2 x3 x4
1: 1 24 41 1
2: 1 22 40 1
3: 1 21 38 1
4: 2 20 40 0
5: 2 21 40 0
6: 3 22 41 1
7: 3 24 40 1
8: 4 20 41 0
Data:
df = structure(list(x1 = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L), x2 = c(24L,
22L, 21L, 20L, 21L, 22L, 24L, 20L), x3 = c(41L, 40L, 38L, 40L,
40L, 41L, 40L, 41L)), .Names = c("x1", "x2", "x3"), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 1
Reputation: 887153
Here is one option with base R
df1$x4 <- table(df1$x1, df1$x2 >=24)[,2][df1$x1]
Or with dplyr
library(dplyr)
df1 %>%
group_by(x1) %>%
mutate(x4 = as.integer(any(x2 >=24)))
Upvotes: 3