Vectorised approach to combine multiple observations

Question

I have not found anything remotely similar on SO (or elsewhere) and am therefore hoping for your help. I am not yet very familiar with finding vectorised approaches and my initial attempt feels quite clumsy.

I currently have a data frame similar to this:

df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE))
colnames(df) <- c("ID", "Status")

I would now like to simplify my observations, showing TRUE only if every single status for the particular ID is given as TRUE, i.e. a final table like

ID    Status
1     FALSE
2     FALSE
3     TRUE

I have managed to do it in a loop (again, even for a loop it might be quite clumsy):

NrID <- df$ID[!duplicated(df$ID)]

for (i in NrID) {
  x <- sum(df$Status[df$ID == i])
  ifelse (x < max(NrID), df$Status[df$ID == i] <- FALSE, df$Status[df$ID == i] <- TRUE)
}

finaldf <- df[!duplicated(df$ID), ]

I would appreciate on advice or functions how to vectorise this approach since my final dataset is quite large and I would just appreciate a cleaner code.

Thanks in advance!

tmfmnk · Accepted Answer

A dplyr solution can be:

df %>%
 group_by(ID) %>%
 summarise(Status = all(Status))

     ID Status
    
1    1. FALSE 
2    2. FALSE 
3    3. TRUE

Or with base R:

aggregate(df$Status, list(df$ID), function(x) all(x))

  Group.1     x
1       1 FALSE
2       2 FALSE
3       3  TRUE

Vectorised approach to combine multiple observations

Answers (2)

Related Questions