How can I perform conditional operations depending on other rows in R?

Question

I have a data frame similar to the following. There are many Strains and Days.

  Strain Day Parasite Rep1 Rep2 Rep3
1    KO1   1      Red    5    6    7
2    KO1   1    Green    6    7    8
3    KO1   1     Both    3    1    5
4    KO2   1      Red    5    6    7
5    KO2   1    Green    6    7    8
6    KO2   1     Both   10   10   10

Some parasites are red, some are green, and some are both. I'd like to create a new data frame where the new Red <- Red+Both, and the new Green <- Green+Both (for Rep1, Rep2 and Rep3).

Specifically, if Parasite=="Red" | Parasite=="Green", then add to Rep1 the value of Rep1 for Parasite=="Both, from the same Strain and Day. Repeat for Rep2 and Rep3 for this row, then repeat for all other Parasite=="Red" | Parasite=="Green". For the final data frame, don't include rows where Parasite=="Both.

The new data frame should look like this.

  Strain Day Parasite Obs1 Obs2 Obs3
1    KO1   1      Red    8    7   12
2    KO1   1    Green    9    8   13
3    KO2   1      Red   15   16   17
4    KO2   1    Green   16   17   18

akrun · Accepted Answer

We can use data.table for this. We create a vector of column names that start with 'Rep' using grep ('nm1'). Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Strain', 'Day', we subset the columns 'nm1' (.SD[, nm1, with=FALSE]). This can be used along with 'Parasite' as input for Map. Subset each column in .SD[, nm1, with=FALSE] based on values in 'Parasite' that are either 'Red' or 'Both' and 'Green' or 'Both'. get the sum of each of the columns within the Map. Then, we create the 'Parasite' column by recycling the 'Red', 'Green' string and change the column names if required (setnames(..).

 library(data.table)
 nm1 <- grep('^Rep', names(df1), value=TRUE)
 res <- setDT(df1)[, Map(function(x,y) c(sum(x[y %in% c('Red', 'Both')]), 
                                    sum(x[y %in% c('Green', 'Both')])),
             .SD[, nm1, with=FALSE], list(Parasite)), .(Strain, Day)
                 ][, Parasite:=c('Red', 'Green')][]
setnames(res, 2:4, paste0('Obs', 1:3))
res
#   Strain Day Obs1 Obs2 Obs3 Parasite
#1:    KO1   1    8    7   12      Red
#2:    KO1   1    9    8   13    Green
#3:    KO2   1   15   16   17      Red
#4:    KO2   1   16   17   18    Green

str(res)
#Classes ‘data.table’ and 'data.frame':  4 obs. of  6 variables:
# $ Strain  : chr  "KO1" "KO1" "KO2" "KO2"
# $ Obs1    : int  1 1 1 1
# $ Obs2    : int  8 9 15 16  
# $ Obs3    : int  7 8 16 17
# $ Rep3    : int  12 13 17 18
# $ Parasite: chr  "Red" "Green" "Red" "Green"

Or we can use lapply

res1 <- setDT(df1)[, c(list(Parasite=c('Red', 'Green')),
         lapply(.SD[, nm1, with=FALSE], function(x) 
                 c(sum(x[Parasite %in% c('Red', 'Both')]), 
                 sum(x[Parasite %in% c('Green', 'Both')])))), 
                        .(Strain, Day)]
setnames(res1, nm1, paste0('Obs', 1:3))

data

df1 <- structure(list(Strain = c("KO1", "KO1", "KO1", "KO2", "KO2", 
"KO2"), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Parasite = c("Red", 
"Green", "Both", "Red", "Green", "Both"), Rep1 = c(5L, 6L, 3L, 
5L, 6L, 10L), Rep2 = c(6L, 7L, 1L, 6L, 7L, 10L), Rep3 = c(7L, 
8L, 5L, 7L, 8L, 10L)), .Names = c("Strain", "Day", "Parasite", 
"Rep1", "Rep2", "Rep3"), class = "data.frame", 
 row.names = c("1", "2", "3", "4", "5", "6"))

How can I perform conditional operations depending on other rows in R?

Answers (2)

data

Related Questions