How do I check whether a value is present in any of several rows with the same ID in R?

Question

I have a large (550k x 200) data frame that contains information on incidents and actors. Multiple actors can be involved with an incident, and I am trying to create an incident-level data frame with information about whether any actor has a certain attribute.

A reprex would be:

incident <- c("A", "B", "C", "D", "A", "B")
actors <- c(1, 2, 3, 4, 5, 6)
attribute <- c("red", "blue", "red", "blue", "blue", "red")
df <- data.frame(cbind(incident, actors, attribute))

My goal is to have a dummy variable indicating whether each incident involved a red actor.

G. Grothendieck · Accepted Answer

1) Base R Put a + in front of ave if you want 0/1 instead of logical.

transform(df, red = ave(attribute == "red", incident, FUN = any))

2) dplyr Put a + in front of any if you want 0/1 instead of logical.

library(dplyr)
df %>%
  group_by(incident) %>%
  mutate(red = any(attribute == "red")) %>%
  ungroup

3) data.table Put a + in front of any if you want 0/1 instead of logical.

library(data.table)
DT <- as.data.table(df)
DT[, red := any(attribute == "red"), by = incident]

4) sql This returns a red column of 0/1 values.

library(sqldf)
sqldf("select *, max(attribute = 'red') over (partition by incident) red 
  from df
  order by rowid")

Note

The cbind in the defition of df should be removed since it will coerce the numeric column to character. With the code below it retains its original class.

incident <- c("A", "B", "C", "D", "A", "B")
actors <- c(1, 2, 3, 4, 5, 6)
attribute <- c("red", "blue", "red", "blue", "blue", "red")
df <- data.frame(incident, actors, attribute)

How do I check whether a value is present in any of several rows with the same ID in R?

Answers (2)

Note

Related Questions