ajax2000
ajax2000

Reputation: 711

create a new column based on group in existing column in R

I am cleaning some data in R and have a dataset like this:

x1, x2, x3
1, 24, 41
1, 22, 40
1, 21, 38
2, 20, 40
2, 21, 40
3, 22, 41
3, 24, 40
4, 20, 41

I want to add a new column, and the value of each row is based on both x1 and x2 column. Within each group in x1, I want to know if the value in x2 is greater than or equal to, say 24. If true, all the values in the new column for that group are set to 1.

So data should look like this:

x1, x2, x3, x4
1, 24, 41, 1
1, 22, 40, 1
1, 21, 38, 1
2, 20, 40, 0
2, 21, 40, 0
3, 22, 41, 1
3, 24, 40, 1
4, 20, 41, 0

The purpose of this is for aggregating the rows. I would like to aggregate the data based on groups in x1, but still need information on the other columns.

Upvotes: 2

Views: 7978

Answers (2)

acylam
acylam

Reputation: 18681

Analogous to @akrun's answer, here is the data.table equivalent:

library(data.table)

setDT(df)[, x4 := any(x2>=24)*1, by=x1]

Result:

   x1 x2 x3 x4
1:  1 24 41  1
2:  1 22 40  1
3:  1 21 38  1
4:  2 20 40  0
5:  2 21 40  0
6:  3 22 41  1
7:  3 24 40  1
8:  4 20 41  0

Data:

df = structure(list(x1 = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L), x2 = c(24L, 
22L, 21L, 20L, 21L, 22L, 24L, 20L), x3 = c(41L, 40L, 38L, 40L, 
40L, 41L, 40L, 41L)), .Names = c("x1", "x2", "x3"), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 1

akrun
akrun

Reputation: 887153

Here is one option with base R

df1$x4 <- table(df1$x1, df1$x2 >=24)[,2][df1$x1]

Or with dplyr

library(dplyr)
df1 %>%
   group_by(x1) %>%
   mutate(x4 = as.integer(any(x2 >=24))) 

Upvotes: 3

Related Questions