Reputation: 364
I have a matrix of data that looks like the following:
> taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = 10, ncol = 7)
> rownames(taxmat) <- paste0("OTU", 1:nrow(taxmat))
> taxmat<-cbind(taxmat,c("Genus","Genus","Genus","Family","Family","Order","Genus","Species","Genus","Species"))
> colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Lowest")
> taxmat
Domain Phylum Class Order Family Genus Species Lowest
OTU1 "h" "c" "q" "e" "q" "w" "v" "Genus"
OTU2 "f" "y" "q" "z" "p" "w" "v" "Genus"
OTU3 "w" "q" "i" "i" "z" "j" "f" "Genus"
OTU4 "c" "e" "f" "n" "z" "b" "d" "Family"
OTU5 "g" "w" "q" "k" "e" "x" "k" "Family"
OTU6 "x" "j" "l" "w" "z" "o" "q" "Order"
OTU7 "k" "s" "j" "y" "t" "a" "t" "Genus"
OTU8 "w" "u" "s" "w" "g" "y" "n" "Species"
OTU9 "t" "r" "t" "o" "i" "l" "z" "Genus"
OTU10 "x" "p" "j" "f" "k" "q" "w" "Species"
The column "Lowest" tells me the lowest rank I have confidence in the data for that row. For each row, I would like to replace the value(s) in the column(s) following the column indicated by "Lowest" with "unknown."
Expected output for this example would be:
Domain Phylum Class Order Family Genus Species Lowest
OTU1 "b" "b" "v" "v" "l" "n" "unknown" "Genus"
OTU2 "l" "m" "w" "b" "f" "y" "unknown" "Genus"
OTU3 "h" "w" "n" "y" "k" "f" "unknown" "Genus"
OTU4 "u" "m" "p" "n" "t" "unknown" "unknown" "Family"
OTU5 "o" "b" "q" "w" "a" "unknown" "unknown" "Family"
OTU6 "s" "j" "l" "d" "unknown""unknown" "unknown" "Order"
OTU7 "v" "y" "t" "p" "s" "v" "unknown" "Genus"
OTU8 "b" "r" "k" "d" "q" "c" "q" "Species"
OTU9 "k" "h" "b" "w" "h" "x" "unknown" "Genus"
OTU10 "o" "p" "b" "n" "k" "d" "q" "Species"
I can get all the indexes to replace as a vector with
idx<-lapply(tax$Lowest, grep, colnames(tax))
idx <- as.numeric(unlist(idx))+1
But I'm not sure how to replace those values. Thanks for your help!
Upvotes: 1
Views: 1066
Reputation: 887911
We can use loop through the rows with apply
and create a logical index by match
ing the names
of the columns with that of the last element i.e. element in 'Lowest' to replace
the values of the rows to 'unknown'
t(apply(m1, 1, function(x) {
i1 <- match( x[8], names(x)[-8])+1
i1[i1>7] <- 0
i1 <- if(i1!=0) i1:7 else i1
c(replace(x[-8], i1, "unknown"), x[8])}))
# Domain Phylum Class Order Family Genus Species Lowest
#OTU1 "b" "b" "v" "v" "l" "n" "unknown" "Genus"
#OTU2 "l" "m" "w" "b" "f" "y" "unknown" "Genus"
#OTU3 "h" "w" "n" "y" "k" "f" "unknown" "Genus"
#OTU4 "u" "m" "p" "n" "t" "unknown" "unknown" "Family"
#OTU5 "o" "b" "q" "w" "a" "unknown" "unknown" "Family"
#OTU6 "s" "j" "l" "d" "unknown" "unknown" "unknown" "Order"
#OTU7 "v" "y" "t" "p" "s" "v" "unknown" "Genus"
#OTU8 "b" "r" "k" "d" "q" "c" "q" "Species"
#OTU9 "k" "h" "b" "w" "h" "x" "unknown" "Genus"
#OTU10 "o" "p" "b" "n" "k" "d" "q" "Species"
Or another option is to create a row/column index based on the match
of column names with the last column of 'm1' and the sequence of rows and then cbind
the indexes and assign the values in 'm1' to 'unknown'
lst <- Map(function(x, y) if(x >y) 0 else x:y, match(m1[,8], colnames(m1)[-8])+1, 7)
m1[cbind(rep(seq_len(nrow(m1)), lengths(lst)), unlist(lst))] <- "unknown"
Upvotes: 1