Reputation: 77
I am trying to automatize a process in R if it is possible in order to avoid to do it manually because it will be 5000 rows to check manually.
I attach a toy example to be more clear of the process that I would like to do.
I have compared 5 methods to classify some reads to species.
Consider for example the first 5 cases:
code <- sprintf("sample % d", 1:5)
Specie_methodA<- c("NA", "NA","NA","NA", "Escherichia coli")
Specie_methodB<- c("Methanobrevibacter smithii", "NA", "NA","Blautia faecis","NA")
Specie_methodC<- c("","","","Blautia faecis","")
Specie_methodD<-c("NA","NA","CAG-41_sp900066215","NA","")
Specie_methodE<-c("","","","","Campylobacter coli")
table <- data.frame(code, Specie_methodA, Specie_methodB, Specie_methodC, Specie_methodD, Specie_methodE)
For each row, I would like to check if a particular specie is obtained,and if it is the case to print it his name in a new column (desired_output in table2, see code below). If two different species are obtained within a row between the 5 methods, I desire a "ERROR" string output. And if no specie is detect by any of the 5 methods, that will print "NA".
Therefore by the table indicated above, I desired to obtain the next output:
desired_output<-c("Methanobrevibacter smithii", "NA","CAG-41_sp90006621","Blautia faecis","ERROR")
table2 <- data.frame(code, Specie_methodA, Specie_methodB, Specie_methodC, Specie_methodD, Specie_methodE,desired_output)
Upvotes: 1
Views: 118
Reputation: 3902
We can create a user-defined function
get_desired_output <- function(specie1,specie2,specie3,specie4,specie5){
species <- c(specie1,specie2,specie3,specie4,specie5)
# remove empty string, NA string and duplicates
species <- species[!(species%in%c('NA',''))]%>%unique()
if(length(species)==0){
return('NA')
}
if(length(species)>1){
return('ERROR')
}
return(species)
}
ifdplyr>=1.0.0
:
output <- table%>%
mutate(across(Specie_methodA:Specie_methodE, as.character))%>%
rowwise()%>%
mutate(desired_output=get_desired_output(Specie_methodA,Specie_methodB,Specie_methodC,Specie_methodD,Specie_methodE))
ifdplyr<1.0.0
:
output <- table%>%
mutate_at(vars(Specie_methodA:Specie_methodE),as.character)%>%
rowwise()%>%
mutate(desired_output=get_desired_output(Specie_methodA,Specie_methodB,Specie_methodC,Specie_methodD,Specie_methodE))
> output
Source: local data frame [5 x 7]
Groups: <by row>
# A tibble: 5 x 7
code Specie_methodA Specie_methodB Specie_methodC Specie_methodD Specie_methodE desired_output
<fct> <chr> <chr> <chr> <chr> <chr> <chr>
1 sample ~ NA Methanobrevibacter ~ "" NA "" Methanobrevibacter ~
2 sample ~ NA NA "" NA "" NA
3 sample ~ NA NA "" CAG-41_sp900066~ "" CAG-41_sp900066215
4 sample ~ NA Blautia faecis Blautia faecis NA "" Blautia faecis
5 sample ~ Escherichia co~ NA "" "" Campylobacter c~ ERROR
Upvotes: 1