helicase
helicase

Reputation: 364

Replace values in data frame from column of indexes

I have a matrix of data that looks like the following:

> taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = 10, ncol = 7)
> rownames(taxmat) <- paste0("OTU", 1:nrow(taxmat))
> taxmat<-cbind(taxmat,c("Genus","Genus","Genus","Family","Family","Order","Genus","Species","Genus","Species"))
> colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Lowest")
> taxmat
      Domain Phylum Class Order Family Genus Species Lowest   
OTU1  "h"    "c"    "q"   "e"   "q"    "w"   "v"     "Genus"  
OTU2  "f"    "y"    "q"   "z"   "p"    "w"   "v"     "Genus"  
OTU3  "w"    "q"    "i"   "i"   "z"    "j"   "f"     "Genus"  
OTU4  "c"    "e"    "f"   "n"   "z"    "b"   "d"     "Family" 
OTU5  "g"    "w"    "q"   "k"   "e"    "x"   "k"     "Family" 
OTU6  "x"    "j"    "l"   "w"   "z"    "o"   "q"     "Order"  
OTU7  "k"    "s"    "j"   "y"   "t"    "a"   "t"     "Genus"  
OTU8  "w"    "u"    "s"   "w"   "g"    "y"   "n"     "Species"
OTU9  "t"    "r"    "t"   "o"   "i"    "l"   "z"     "Genus"  
OTU10 "x"    "p"    "j"   "f"   "k"    "q"   "w"     "Species"

The column "Lowest" tells me the lowest rank I have confidence in the data for that row. For each row, I would like to replace the value(s) in the column(s) following the column indicated by "Lowest" with "unknown."

Expected output for this example would be:

       Domain Phylum Class Order Family   Genus     Species       Lowest
 OTU1  "b"    "b"    "v"   "v"   "l"      "n"       "unknown"     "Genus"
 OTU2  "l"    "m"    "w"   "b"   "f"      "y"       "unknown"     "Genus"
 OTU3  "h"    "w"    "n"   "y"   "k"      "f"       "unknown"     "Genus"
 OTU4  "u"    "m"    "p"   "n"   "t"      "unknown" "unknown"     "Family"
 OTU5  "o"    "b"    "q"   "w"   "a"      "unknown" "unknown"     "Family"
 OTU6  "s"    "j"    "l"   "d"   "unknown""unknown" "unknown"     "Order"
 OTU7  "v"    "y"    "t"   "p"   "s"      "v"       "unknown"     "Genus"
 OTU8  "b"    "r"    "k"   "d"   "q"      "c"       "q"           "Species"
 OTU9  "k"    "h"    "b"   "w"   "h"      "x"       "unknown"     "Genus"
 OTU10 "o"    "p"    "b"   "n"   "k"      "d"       "q"           "Species"

I can get all the indexes to replace as a vector with

idx<-lapply(tax$Lowest, grep, colnames(tax))
idx <- as.numeric(unlist(idx))+1

But I'm not sure how to replace those values. Thanks for your help!

Upvotes: 1

Views: 1066

Answers (1)

akrun
akrun

Reputation: 887911

We can use loop through the rows with apply and create a logical index by matching the names of the columns with that of the last element i.e. element in 'Lowest' to replace the values of the rows to 'unknown'

t(apply(m1, 1, function(x) {
         i1 <- match( x[8], names(x)[-8])+1
         i1[i1>7] <- 0
         i1 <- if(i1!=0) i1:7 else i1
        c(replace(x[-8], i1, "unknown"), x[8])}))
#      Domain Phylum Class Order Family    Genus     Species   Lowest   
#OTU1  "b"    "b"    "v"   "v"   "l"       "n"       "unknown" "Genus"  
#OTU2  "l"    "m"    "w"   "b"   "f"       "y"       "unknown" "Genus"  
#OTU3  "h"    "w"    "n"   "y"   "k"       "f"       "unknown" "Genus"  
#OTU4  "u"    "m"    "p"   "n"   "t"       "unknown" "unknown" "Family" 
#OTU5  "o"    "b"    "q"   "w"   "a"       "unknown" "unknown" "Family" 
#OTU6  "s"    "j"    "l"   "d"   "unknown" "unknown" "unknown" "Order"  
#OTU7  "v"    "y"    "t"   "p"   "s"       "v"       "unknown" "Genus"  
#OTU8  "b"    "r"    "k"   "d"   "q"       "c"       "q"       "Species"
#OTU9  "k"    "h"    "b"   "w"   "h"       "x"       "unknown" "Genus"  
#OTU10 "o"    "p"    "b"   "n"   "k"       "d"       "q"       "Species"

Or another option is to create a row/column index based on the match of column names with the last column of 'm1' and the sequence of rows and then cbind the indexes and assign the values in 'm1' to 'unknown'

lst <- Map(function(x, y) if(x >y) 0 else x:y, match(m1[,8], colnames(m1)[-8])+1, 7)
m1[cbind(rep(seq_len(nrow(m1)), lengths(lst)), unlist(lst))] <- "unknown"

Upvotes: 1

Related Questions