WillyWonka
WillyWonka

Reputation: 107

Vectorised approach to combine multiple observations

I have not found anything remotely similar on SO (or elsewhere) and am therefore hoping for your help. I am not yet very familiar with finding vectorised approaches and my initial attempt feels quite clumsy.

I currently have a data frame similar to this:

df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE))
colnames(df) <- c("ID", "Status")

I would now like to simplify my observations, showing TRUE only if every single status for the particular ID is given as TRUE, i.e. a final table like

ID    Status
1     FALSE
2     FALSE
3     TRUE

I have managed to do it in a loop (again, even for a loop it might be quite clumsy):

NrID <- df$ID[!duplicated(df$ID)]

for (i in NrID) {
  x <- sum(df$Status[df$ID == i])
  ifelse (x < max(NrID), df$Status[df$ID == i] <- FALSE, df$Status[df$ID == i] <- TRUE)
}

finaldf <- df[!duplicated(df$ID), ]

I would appreciate on advice or functions how to vectorise this approach since my final dataset is quite large and I would just appreciate a cleaner code.

Thanks in advance!

Upvotes: 0

Views: 31

Answers (2)

s_baldur
s_baldur

Reputation: 33548

If speed and concision is what you are after you might like data.table:

Setup:

library(data.table)
setDT(df) # Convert to data.table

Calculations:

df[, .(Status = all(Status)), by = ID]

#    ID Status
# 1:  1  FALSE
# 2:  2  FALSE
# 3:  3   TRUE

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40161

A dplyr solution can be:

df %>%
 group_by(ID) %>%
 summarise(Status = all(Status))

     ID Status
  <dbl> <lgl> 
1    1. FALSE 
2    2. FALSE 
3    3. TRUE 

Or with base R:

aggregate(df$Status, list(df$ID), function(x) all(x))

  Group.1     x
1       1 FALSE
2       2 FALSE
3       3  TRUE

Upvotes: 2

Related Questions