Reputation: 1112
I would like to ask experts to help me to combine different number of columns by group (sample). Each sample may have 1~3 replicates. I need to merge data from replicates into samples. For each row, Within each sample if the character values across all replicates are the same, just take one; if different, ignoring "NC" or "AB" when "AA" or "BB" exist; if only "NC" and "AB" exist in the replicates, take "AB" ignoring "NC"; if "AA" and "BB" exist in different replicates, give "BA" to that data point. the data frame is:
df = read.table(text="ID SP01_1 SP01_2 SP02_1 SP02_2 SP03_1 SP03_2 SP03_3
T001 AA AA AA AB AA AB AA
T002 AA AB BB AA AA AA AA
T003 AB AB BB BB AA BB BB
T004 NC NC NC BB BB NC BB
T005 BB AB BB NC AA BB AB
T006 AB BB BB BB NC AB BB
T007 AA NC AA BB AA AA BB
T008 AB BB BB AB BB BB BB", header=T, stringsAsFactors=F)
the result is expected as:
ID SP01 SP02 SP03
T001 AA AA AA
T002 AA BA AA
T003 AB BB BA
T004 NC BB BB
T005 BB BB BA
T006 BB BB BB
T007 AA BA BA
T008 BB BB BB
Upvotes: 2
Views: 1252
Reputation: 6711
There are multiple ways to do this. I like using the reshape
package.
# Rearrange the data
library(reshape)
mdf = melt(df,id.var="ID")
mdf$variable = substr(mdf$variable,1,4)
mdf$value = as.character(mdf$value)
head(mdf)
# ID variable value
# 1 T001 SP01 AA
# 2 T002 SP01 AA
# 3 T003 SP01 AB
# 4 T004 SP01 NC
# 5 T005 SP01 BB
# 6 T006 SP01 AB
# The function to aggregate the replicates
foo = function(reps) {
if( "AA" %in% reps && "BB" %in% reps ) {
return("BA")
} else {
matches = c("AA","BB","AB","NC")
return( matches[min(match(reps,matches))] )
}
}
All the hard work is now done.
# A function to "cast" the melted data back into a data frame
cast(mdf, ID ~ variable, fun.aggregate=foo )
# ID SP01 SP02 SP03
# 1 T001 AA AA AA
# 2 T002 AA BA AA
# 3 T003 AB BB BA
# 4 T004 NC BB BB
# 5 T005 BB BB BA
# 6 T006 BB BB BB
# 7 T007 AA BA BA
# 8 T008 BB BB BB
Upvotes: 1