Reputation: 1677
I have a data frame similar to the following. There are many Strain
s and Day
s.
Strain Day Parasite Rep1 Rep2 Rep3
1 KO1 1 Red 5 6 7
2 KO1 1 Green 6 7 8
3 KO1 1 Both 3 1 5
4 KO2 1 Red 5 6 7
5 KO2 1 Green 6 7 8
6 KO2 1 Both 10 10 10
Some parasites are red, some are green, and some are both. I'd like to create a new data frame where the new Red <- Red+Both
, and the new Green <- Green+Both
(for Rep1
, Rep2
and Rep3
).
Specifically, if Parasite=="Red" | Parasite=="Green"
, then add to Rep1
the value of Rep1
for Parasite=="Both
, from the same Strain
and Day
. Repeat for Rep2
and Rep3
for this row, then repeat for all other Parasite=="Red" | Parasite=="Green"
. For the final data frame, don't include rows where Parasite=="Both
.
The new data frame should look like this.
Strain Day Parasite Obs1 Obs2 Obs3
1 KO1 1 Red 8 7 12
2 KO1 1 Green 9 8 13
3 KO2 1 Red 15 16 17
4 KO2 1 Green 16 17 18
Upvotes: 2
Views: 677
Reputation: 886938
We can use data.table
for this. We create a vector of column names that start with 'Rep' using grep
('nm1'). Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'Strain', 'Day', we subset the columns 'nm1' (.SD[, nm1, with=FALSE]
). This can be used along with 'Parasite' as input for Map
. Subset each column in .SD[, nm1, with=FALSE]
based on values in 'Parasite' that are either 'Red' or 'Both' and 'Green' or 'Both'. get the sum
of each of the columns within the Map
. Then, we create the 'Parasite' column by recycling the 'Red', 'Green' string and change the column names if required (setnames(..
).
library(data.table)
nm1 <- grep('^Rep', names(df1), value=TRUE)
res <- setDT(df1)[, Map(function(x,y) c(sum(x[y %in% c('Red', 'Both')]),
sum(x[y %in% c('Green', 'Both')])),
.SD[, nm1, with=FALSE], list(Parasite)), .(Strain, Day)
][, Parasite:=c('Red', 'Green')][]
setnames(res, 2:4, paste0('Obs', 1:3))
res
# Strain Day Obs1 Obs2 Obs3 Parasite
#1: KO1 1 8 7 12 Red
#2: KO1 1 9 8 13 Green
#3: KO2 1 15 16 17 Red
#4: KO2 1 16 17 18 Green
str(res)
#Classes ‘data.table’ and 'data.frame': 4 obs. of 6 variables:
# $ Strain : chr "KO1" "KO1" "KO2" "KO2"
# $ Obs1 : int 1 1 1 1
# $ Obs2 : int 8 9 15 16
# $ Obs3 : int 7 8 16 17
# $ Rep3 : int 12 13 17 18
# $ Parasite: chr "Red" "Green" "Red" "Green"
Or we can use lapply
res1 <- setDT(df1)[, c(list(Parasite=c('Red', 'Green')),
lapply(.SD[, nm1, with=FALSE], function(x)
c(sum(x[Parasite %in% c('Red', 'Both')]),
sum(x[Parasite %in% c('Green', 'Both')])))),
.(Strain, Day)]
setnames(res1, nm1, paste0('Obs', 1:3))
df1 <- structure(list(Strain = c("KO1", "KO1", "KO1", "KO2", "KO2",
"KO2"), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Parasite = c("Red",
"Green", "Both", "Red", "Green", "Both"), Rep1 = c(5L, 6L, 3L,
5L, 6L, 10L), Rep2 = c(6L, 7L, 1L, 6L, 7L, 10L), Rep3 = c(7L,
8L, 5L, 7L, 8L, 10L)), .Names = c("Strain", "Day", "Parasite",
"Rep1", "Rep2", "Rep3"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
Upvotes: 1
Reputation: 1361
assuming your data is stored in the variable data
library(data.table)
# that way we can use "by=StrainDay" later on
data$StrainDay <- paste(data$Strain,data$Day,sep="_")
# this will sum up the green parasites and both by StrainDay which we created earlier
res1 <- data[,list("Obs1"=sum(Rep1[Parasite=="Green" | Parasite=="Both"]),
"Obs2"=sum(Rep2[Parasite=="Green" | Parasite=="Both"]),
"Obs3"=sum(Rep3[Parasite=="Green" | Parasite=="Both"]),
"Parasite"="Green"),
by=StrainDay]
# this will sum up the red parasites and both by StrainDay as well
res2 <- data[,list("Obs1"=sum(Rep1[Parasite=="Red" | Parasite=="Both"]),
"Obs2"=sum(Rep2[Parasite=="Red" | Parasite=="Both"]),
"Obs3"=sum(Rep3[Parasite=="Red" | Parasite=="Both"]),
"Parasite"="Red"),
by=StrainDay]
# the output isn't in the same order as yours, but you can fix that easily
res <- rbind(res1,res2)
res$Strain <- unlist(strsplit(res$StrainDay,"_"))[seq(1,nrow(res),2)]
res$Day <- unlist(strsplit(res$StrainDay,"_"))[seq(2,nrow(res),2)]
res$StrainDay <- NULL
> res
# Obs1 Obs2 Obs3 Parasite Strain Day
#1: 9 8 13 Green KO1 1
#2: 16 17 18 Green KO2 1
#3: 8 7 12 Red KO1 1
#4: 15 16 17 Red KO2 1
Upvotes: 2