Reputation: 258
I have a large (550k x 200) data frame that contains information on incidents and actors. Multiple actors can be involved with an incident, and I am trying to create an incident-level data frame with information about whether any actor has a certain attribute.
A reprex would be:
incident <- c("A", "B", "C", "D", "A", "B")
actors <- c(1, 2, 3, 4, 5, 6)
attribute <- c("red", "blue", "red", "blue", "blue", "red")
df <- data.frame(cbind(incident, actors, attribute))
My goal is to have a dummy variable indicating whether each incident involved a red actor.
Upvotes: 1
Views: 110
Reputation: 269491
1) Base R Put a + in front of ave if you want 0/1 instead of logical.
transform(df, red = ave(attribute == "red", incident, FUN = any))
2) dplyr Put a + in front of any if you want 0/1 instead of logical.
library(dplyr)
df %>%
group_by(incident) %>%
mutate(red = any(attribute == "red")) %>%
ungroup
3) data.table Put a + in front of any if you want 0/1 instead of logical.
library(data.table)
DT <- as.data.table(df)
DT[, red := any(attribute == "red"), by = incident]
4) sql This returns a red column of 0/1 values.
library(sqldf)
sqldf("select *, max(attribute = 'red') over (partition by incident) red
from df
order by rowid")
The cbind in the defition of df should be removed since it will coerce the numeric column to character. With the code below it retains its original class.
incident <- c("A", "B", "C", "D", "A", "B")
actors <- c(1, 2, 3, 4, 5, 6)
attribute <- c("red", "blue", "red", "blue", "blue", "red")
df <- data.frame(incident, actors, attribute)
Upvotes: 3
Reputation: 258
When I created the reprex, I realized what I needed to do. I am sharing it here in case it helps anybody else.
The simplest solution I could come up with was.
ddply(df, .(incident), summarize, Red = "red" %in% attribute)
Upvotes: 0