Reputation: 107
I have not found anything remotely similar on SO (or elsewhere) and am therefore hoping for your help. I am not yet very familiar with finding vectorised approaches and my initial attempt feels quite clumsy.
I currently have a data frame similar to this:
df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE))
colnames(df) <- c("ID", "Status")
I would now like to simplify my observations, showing TRUE only if every single status for the particular ID is given as TRUE, i.e. a final table like
ID Status
1 FALSE
2 FALSE
3 TRUE
I have managed to do it in a loop (again, even for a loop it might be quite clumsy):
NrID <- df$ID[!duplicated(df$ID)]
for (i in NrID) {
x <- sum(df$Status[df$ID == i])
ifelse (x < max(NrID), df$Status[df$ID == i] <- FALSE, df$Status[df$ID == i] <- TRUE)
}
finaldf <- df[!duplicated(df$ID), ]
I would appreciate on advice or functions how to vectorise this approach since my final dataset is quite large and I would just appreciate a cleaner code.
Thanks in advance!
Upvotes: 0
Views: 31
Reputation: 33548
If speed and concision is what you are after you might like data.table
:
Setup:
library(data.table)
setDT(df) # Convert to data.table
Calculations:
df[, .(Status = all(Status)), by = ID]
# ID Status
# 1: 1 FALSE
# 2: 2 FALSE
# 3: 3 TRUE
Upvotes: 1
Reputation: 40161
A dplyr
solution can be:
df %>%
group_by(ID) %>%
summarise(Status = all(Status))
ID Status
<dbl> <lgl>
1 1. FALSE
2 2. FALSE
3 3. TRUE
Or with base R:
aggregate(df$Status, list(df$ID), function(x) all(x))
Group.1 x
1 1 FALSE
2 2 FALSE
3 3 TRUE
Upvotes: 2