Reputation: 49
I have a problem with my dataset. Here is my data:
df <- data.frame(OTU=c(1,2,3),
Domain=c("Bacteria", "Bacteria", "Archaea"),
Phylum= c("Atribacteria", "Proteobacteria", "uncultured
archaea"),
Class =c("JS1", "uncultured bacterium", "uncultured archaea"),
Order=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"),
Family=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"), stringsAsFactors = FALSE)
df
OTU Domain Phylum Class Order Family
1 1 Bacteria Atribacteria JS1 uncultured bacterium uncultured bacterium
2 2 Bacteria Proteobacteria uncultured bacterium uncultured uncultured
3 3 Archaea uncultured archaea uncultured archaea Ambiguous_taxa Ambiguous_taxa
Summary of my question: Here I would like to change the every text starting with uncultured or Ambiguous with the left column info. If there is more than "uncultured or Ambiguous" written column, It should get the information from the left columns where it has a specific name. For example: In the Order column of third row I have "Ambiguous taxa". So this row should get its name from the Domain where it finds a name without any uncultured or ambiguous in it. So, all the other columns in the right of "Phylum" column should be "uncultured Archaea". Here is the output table that I want to see:
OTU Domain Phylum Class Order Family
1 1 Bacteria Atribacteria JS1 uncultured JS1 uncultured JS1
2 2 Bacteria Proteobacteria uncultured Proteobacteria uncultured Proteobacteria uncultured Proteobacteria
3 3 Archaea uncultured Archaea uncultured Archaea uncultured Archaea uncultured Archaea
I have tried to that in for loop but failed to do so. I am getting warnings and it is not changing anything. I am kind of new to "R". I am trying to say, find "uncultured" pattern using grep in the Order column and change it to for example to "uncultured JS1" using paste function for every "uncultured" pattern that it finds.
> Changing_uncultured <- function(DATA){ for(i in 1:length(DATA$Order))
> { if(grep("uncultured", DATA$Order)) {
> DATA$Order[i] <- paste("uncultured", DATA$Class[i]) } } }
Changing_uncultured(DATA=df)
Thanks in advance. Sorry for edits, It is my fault that I did not consider the fact that the uncultured names can start from any column. Now It reflects the actual data.
Upvotes: 2
Views: 81
Reputation: 39747
In base you can test with grepl
and sapply
where you have a match with ^uncultured|^Ambiguous
. With apply
and any
you get the rows where you have a hit. And then you simply have to overwrite those line:
df <- data.frame(OTU=c(1,2,3),
Domain=c("Bacteria", "Bacteria", "Archaea"),
Phylum= c("Atribacteria", "Proteobacteria", "uncultured archaea"),
Class =c("JS1", "uncultured bacterium", "uncultured archaea"),
Order=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"),
Family=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"), stringsAsFactors = FALSE)
t1 <- sapply(df, grepl, pattern="^uncultured|^Ambiguous")
t2 <- apply(t1, 1, any)
t3 <- apply(t1, 1, which.max)
for(i in seq_len(nrow(df))) {
if(t2[i]) {df[i, t3[i]:ncol(df)] <- paste("uncultured", df[i, t3[i]-1])}
}
df
# OTU Domain Phylum Class Order Family
#1 1 Bacteria Atribacteria JS1 uncultured JS1 uncultured JS1
#2 2 Bacteria Proteobacteria uncultured Proteobacteria uncultured Proteobacteria uncultured Proteobacteria
#3 3 Archaea uncultured Archaea uncultured Archaea uncultured Archaea uncultured Archaea
Answer before the Question-update:
df <- data.frame(OTU=c(1,2,3),
Domain=c("Bacteria", "Bacteria", "Bacteria"),
Phylum= c("Atribacteria", "Proteobacteria", "Y"),
Class =c("JS1", "X", "JS2"),
Order=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"),
Family=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"), stringsAsFactors = FALSE)
tt <- apply(sapply(df[,c("Order", "Family")], grepl, pattern="^uncultured|^Ambiguous"), 1, any) #get rows to relpace
df[tt, c("Order", "Family")] <- paste("uncultured", df$Class[tt])
df
# OTU Domain Phylum Class Order Family
#1 1 Bacteria Atribacteria JS1 uncultured JS1 uncultured JS1
#2 2 Bacteria Proteobacteria X uncultured X uncultured X
#3 3 Bacteria Y JS2 uncultured JS2 uncultured JS2
Upvotes: 1
Reputation: 887981
We can use tidyverse
options
library(tidyverse)
df %>%
mutate_at(vars(Order, Family),
~ case_when(str_detect(., 'uncultured|ambiguous') ~ str_c(
'uncultured', Class), TRUE ~ .))
# OTU Domain Phylum Class Order Family
#1 1 Bacteria Atribacteria JS1 unculturedJS1 unculturedJS1
#2 2 Bacteria Proteobacteria X unculturedX unculturedX
#3 3 Bacteria Y JS2 Ambiguous_taxa Ambiguous_taxa
df <- data.frame(OTU=c(1,2,3),
Domain=c("Bacteria", "Bacteria", "Bacteria"),
Phylum= c("Atribacteria", "Proteobacteria", "Y"),
Class =c("JS1", "X", "JS2"),
Order=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"),
Family=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"), stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 389335
We can use lapply
to loop over multiple columns. Select column by their index or by name. Find out values which start with "uncultured"
or "Ambiguous"
and replace them by adding corresponding Class
value from the same index.
cols <- 5:6
#Or
#cols <- c("Order", "Family")
df[cols] <- lapply(df[cols], function(x) {
inds <- grep("^uncultured|^Ambiguous", x)
x[inds] <- paste0("uncultured ", df$Class[inds])
x
})
df
# OTU Domain Phylum Class Order Family
#1 1 Bacteria Atribacteria JS1 uncultured JS1 uncultured JS1
#2 2 Bacteria Proteobacteria X uncultured X uncultured X
#3 3 Bacteria Y JS2 uncultured JS2 uncultured JS2
data
df <- data.frame(OTU=c(1,2,3),
Domain=c("Bacteria", "Bacteria", "Bacteria"),
Phylum= c("Atribacteria", "Proteobacteria", "Y"),
Class =c("JS1", "X", "JS2"),
Order=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"),
Family=c("uncultured bacterium", "uncultured",
"Ambiguous_taxa"), stringsAsFactors = FALSE)
Upvotes: 2