Sparhawk
Sparhawk

Reputation: 1677

How can I perform conditional operations depending on other rows in R?

I have a data frame similar to the following. There are many Strains and Days.

  Strain Day Parasite Rep1 Rep2 Rep3
1    KO1   1      Red    5    6    7
2    KO1   1    Green    6    7    8
3    KO1   1     Both    3    1    5
4    KO2   1      Red    5    6    7
5    KO2   1    Green    6    7    8
6    KO2   1     Both   10   10   10

Some parasites are red, some are green, and some are both. I'd like to create a new data frame where the new Red <- Red+Both, and the new Green <- Green+Both (for Rep1, Rep2 and Rep3).

Specifically, if Parasite=="Red" | Parasite=="Green", then add to Rep1 the value of Rep1 for Parasite=="Both, from the same Strain and Day. Repeat for Rep2 and Rep3 for this row, then repeat for all other Parasite=="Red" | Parasite=="Green". For the final data frame, don't include rows where Parasite=="Both.

The new data frame should look like this.

  Strain Day Parasite Obs1 Obs2 Obs3
1    KO1   1      Red    8    7   12
2    KO1   1    Green    9    8   13
3    KO2   1      Red   15   16   17
4    KO2   1    Green   16   17   18

Upvotes: 2

Views: 677

Answers (2)

akrun
akrun

Reputation: 886938

We can use data.table for this. We create a vector of column names that start with 'Rep' using grep ('nm1'). Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Strain', 'Day', we subset the columns 'nm1' (.SD[, nm1, with=FALSE]). This can be used along with 'Parasite' as input for Map. Subset each column in .SD[, nm1, with=FALSE] based on values in 'Parasite' that are either 'Red' or 'Both' and 'Green' or 'Both'. get the sum of each of the columns within the Map. Then, we create the 'Parasite' column by recycling the 'Red', 'Green' string and change the column names if required (setnames(..).

 library(data.table)
 nm1 <- grep('^Rep', names(df1), value=TRUE)
 res <- setDT(df1)[, Map(function(x,y) c(sum(x[y %in% c('Red', 'Both')]), 
                                    sum(x[y %in% c('Green', 'Both')])),
             .SD[, nm1, with=FALSE], list(Parasite)), .(Strain, Day)
                 ][, Parasite:=c('Red', 'Green')][]
setnames(res, 2:4, paste0('Obs', 1:3))
res
#   Strain Day Obs1 Obs2 Obs3 Parasite
#1:    KO1   1    8    7   12      Red
#2:    KO1   1    9    8   13    Green
#3:    KO2   1   15   16   17      Red
#4:    KO2   1   16   17   18    Green

str(res)
#Classes ‘data.table’ and 'data.frame':  4 obs. of  6 variables:
# $ Strain  : chr  "KO1" "KO1" "KO2" "KO2"
# $ Obs1    : int  1 1 1 1
# $ Obs2    : int  8 9 15 16  
# $ Obs3    : int  7 8 16 17
# $ Rep3    : int  12 13 17 18
# $ Parasite: chr  "Red" "Green" "Red" "Green"

Or we can use lapply

res1 <- setDT(df1)[, c(list(Parasite=c('Red', 'Green')),
         lapply(.SD[, nm1, with=FALSE], function(x) 
                 c(sum(x[Parasite %in% c('Red', 'Both')]), 
                 sum(x[Parasite %in% c('Green', 'Both')])))), 
                        .(Strain, Day)]
setnames(res1, nm1, paste0('Obs', 1:3))

data

df1 <- structure(list(Strain = c("KO1", "KO1", "KO1", "KO2", "KO2", 
"KO2"), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Parasite = c("Red", 
"Green", "Both", "Red", "Green", "Both"), Rep1 = c(5L, 6L, 3L, 
5L, 6L, 10L), Rep2 = c(6L, 7L, 1L, 6L, 7L, 10L), Rep3 = c(7L, 
8L, 5L, 7L, 8L, 10L)), .Names = c("Strain", "Day", "Parasite", 
"Rep1", "Rep2", "Rep3"), class = "data.frame", 
 row.names = c("1", "2", "3", "4", "5", "6"))

Upvotes: 1

road_to_quantdom
road_to_quantdom

Reputation: 1361

assuming your data is stored in the variable data

library(data.table)
# that way we can use "by=StrainDay" later on
data$StrainDay <- paste(data$Strain,data$Day,sep="_")

# this will sum up the green parasites and both by StrainDay which we created earlier
res1 <- data[,list("Obs1"=sum(Rep1[Parasite=="Green" | Parasite=="Both"]),
                   "Obs2"=sum(Rep2[Parasite=="Green" | Parasite=="Both"]),
                   "Obs3"=sum(Rep3[Parasite=="Green" | Parasite=="Both"]),
                   "Parasite"="Green"),
             by=StrainDay]
# this will sum up the red parasites and both by StrainDay as well
res2 <- data[,list("Obs1"=sum(Rep1[Parasite=="Red" | Parasite=="Both"]),
                   "Obs2"=sum(Rep2[Parasite=="Red" | Parasite=="Both"]),
                   "Obs3"=sum(Rep3[Parasite=="Red" | Parasite=="Both"]),
                   "Parasite"="Red"),
             by=StrainDay]
# the output isn't in the same order as yours, but you can fix that easily
res <- rbind(res1,res2)

res$Strain <- unlist(strsplit(res$StrainDay,"_"))[seq(1,nrow(res),2)]
res$Day <- unlist(strsplit(res$StrainDay,"_"))[seq(2,nrow(res),2)]

res$StrainDay <- NULL

> res
#   Obs1 Obs2 Obs3 Parasite Strain Day
#1:    9    8   13    Green    KO1   1
#2:   16   17   18    Green    KO2   1
#3:    8    7   12      Red    KO1   1
#4:   15   16   17      Red    KO2   1

Upvotes: 2

Related Questions