How to proceed next if the table has zero argument in for loop in R?

Question

Let me explain this question with an example. I have three data frames:

df1: It is a big gigantic table which contains all the information.

  df1 <- data.frame(Gene=c(1,2,3,4,5,6,7,8),
              Description=c("ribonuclease HII", "glycerol-3-phosphate dehydrogenase", "Arginyl-tRNA synthetase (EC 6.1.1.19) 17855:19195", "Arginyl-tRNA synthetase (EC 6.1.1.19)", "PAS domain S-box protein", "ribonuclease HII", "Isoleucyl-tRNA synthetase", "Succinyl-CoA ligase"),
              Species=c("aa", "bb","aa","cc","ee","ff","aa","dd"),
              Number1= c(1,0,3,20,99,100,31,123),
              Number2 =c(1000, 12636,12,455,231,454,123,1), stringsAsFactors = FALSE)


        > df1
  Gene                                       Description Species Number1 Number2
1    1                                  ribonuclease HII      aa       1    1000
2    2                glycerol-3-phosphate dehydrogenase      bb       0   12636
3    3 Arginyl-tRNA synthetase (EC 6.1.1.19) 17855:19195      aa       3      12
4    4             Arginyl-tRNA synthetase (EC 6.1.1.19)      cc      20     455
5    5                          PAS domain S-box protein      ee      99     231
6    6                                  ribonuclease HII      ff     100     454
7    7                         Isoleucyl-tRNA synthetase      aa      31     123
8    8                               Succinyl-CoA ligase      dd     123       1

And df2 and df3 which are subsets of df1 after some grepl and regex functions:

 df2 <- data.frame(Gene=c(1,2,3,4,5,6),
              Description=c("ribonuclease HII", "glycerol-3-phosphate dehydrogenase", "glycerol-3-phosphate dehydrogenase", "Arginyl-tRNA synthetase (EC 6.1.1.19)", "PAS domain S-box protein", "glycerol-3-phosphate dehydrogenase"),
              Species=c("aa", "bb","aa","cc","ee","ff"),
              Number1= c(1,0,3,20,99,100),
              Number2 =c(1000, 12636,12,455,231,454), stringsAsFactors = FALSE)

df3 <- data.frame(Gene=c(1,2,3,4,5,6),
                  Description=c("ribonuclease HII", "nitrite reductase large subunit", "Arginyl-tRNA synthetase (EC 6.1.1.19) 17855:19195", "Cytochrome cd1 nitrite reductase (EC:1.7.2.1)", "PAS domain S-box protein", "nitrite reductase large subunit"),
                  Species=c("aa", "bb","aa","cc","dd", "ff"),
                  Number1= c(1,0,3,20,99,100),
                  Number2 =c(1000, 12636,12,455,231,454), stringsAsFactors = FALSE)



     > df2
  Gene                           Description Species Number1 Number2
1    1                      ribonuclease HII      aa       1    1000
2    2    glycerol-3-phosphate dehydrogenase      bb       0   12636
3    3    glycerol-3-phosphate dehydrogenase      aa       3      12
4    4 Arginyl-tRNA synthetase (EC 6.1.1.19)      cc      20     455
5    5              PAS domain S-box protein      ee      99     231
6    6    glycerol-3-phosphate dehydrogenase      ff     100     454
> df3
  Gene                                       Description Species Number1 Number2
1    1                                  ribonuclease HII      aa       1    1000
2    2                   nitrite reductase large subunit      bb       0   12636
3    3 Arginyl-tRNA synthetase (EC 6.1.1.19) 17855:19195      aa       3      12
4    4     Cytochrome cd1 nitrite reductase (EC:1.7.2.1)      cc      20     455
5    5                          PAS domain S-box protein      dd      99     231
6    6                   nitrite reductase large subunit      ff     100     454

Summary of my question:

Here I would like to get all the species names from df1 having a certain "Description" name and search it in df2 and df3. If this specific Description name exists in both of the data, I want to return a table containing all the information of that species with a new column which writes "complete pathway" next to species passing this criterium. If It only exists in df2, It should write to the new column as incomplete pathway. If that species doesnt exist in both of the data, It should proceed to the next species and should write "No occurrences" to the newly produced column. At the end, I would like to a table with the newly produced information.

Here is what I have tried (I have selected a certain description in df2 and df3, namely as "glycerol-3-phosphate dehydrogenase" and "nitrite reductase large subunit", respectively):

 for(i in unique(df1$Species)) {
  x = subset(df2, Species == i & Description == "glycerol-3-phosphate dehydrogenase")
  y = subset(df3, Species == i & Description == "nitrite reductase large subunit")
  if (!is.na(x$Species) & !is.na(y$Species)){
  print(i, "complete pathway")
  }
  else if(!is.na(x$Species) & is.na(y$Species)){
  print(i, "incomplete pathway")
  }
  else if (is.na(x$Species) & is.na(y$Species)){next}

  }

However It throws an error: Error in if (!is.na(x$Species) & !is.na(y$Species)) { : argument is of length zero

The expected output should be a new table (let's say df4):

df4 <- data.frame(Species=c("aa", "bb","cc","ee","ff", "dd"),
              New.Table=c("Incomplete p.", "Complete p.","No occurences","No occurences","Incomplete p.", "No occurences"), stringsAsFactors = FALSE)
Species     New.Table
1      aa Incomplete p.
2      bb   Complete p.
3      cc No occurences
4      ee No occurences
5      ff Incomplete p.
6      dd No occurences

Thanks in advance. I am also open to your suggestions for the title and the edits in the text!.

Arkning · Accepted Answer

Since you have duplicates the function all() allow me to check if every description in df1 are in df2 or df3. This is a sample of the solution I came with tell me if this is what you expect

my_species <- unique(df1$Species)
my_data_species <- data.frame(Species = my_species, stringsAsFactors = FALSE)
my_function <- function(x) {
  if (all(df1[which(df1$Species == my_species[x]), "Description"] %in% df2$Description) == TRUE & all(df1[which(df1$Species == my_species[x]), "Description"] %in% df3$Description) == TRUE) {
    my_data_species[x, "New Table"] <<- "complete pathway"
  } else if (all(df1[which(df1$Species == my_species[x]), "Description"] %in% df2$Description) == TRUE | all(df1[which(df1$Species == my_species[x]), "Description"] %in% df3$Description) == TRUE) {
    my_data_species[x, "New Table"] <<- "incomplete pathway"
  } else {
    my_data_species[x, "New Table"] <<- "No occurences"
  }
}
lapply(1:length(my_species), my_function)

How to proceed next if the table has zero argument in for loop in R?

Answers (1)

Related Questions