Get the rows that include specific strings

Question

I have a data like this

df<- structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    No = c(8L, 4L, 9L, 2L, 7L, 3L, 5L, 1L, 2L), NO1 = c(1L, 1L, 
    1L, 2L, 1L, 1L, 1L, 1L, 1L), Accessions = structure(c(6L, 
    5L, 1L, 3L, 2L, 7L, 4L, 9L, 8L), .Label = c("E9PCL5", "P00367", 
    "P05783", "P63104", "Q6DD88", "Q6FI13", "Q6P597-3", "Q7Z406-6", 
    "Q9BUA3"), class = "factor"), Accessions2 = structure(c(6L, 
    2L, 1L, 4L, 3L, 7L, 5L, 9L, 8L), .Label = c("B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0", 
    "F5GWF8; F5H6I7; Q6DD88", "P00367; B3KV55; B4DGN5; P49448; F5GYQ4; H0YFJ0; F8WA20", 
    "P05783; F8VZY9", "P63104; E7EX24; H0YB80; B0AZS6; B7Z2E6", 
    "Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13", 
    "Q6P597-3; Q6P597-2; Q6P597", "Q7Z406-2; Q7Z406-6", "Q9BUA3"
    ), class = "factor"), NO3 = c(1L, 0L, 0L, 0L, 1L, 0L, 1L, 
    0L, 0L)), .Names = c("Groups", "No", "NO1", "Accessions", 
"Accessions2", "NO3"), class = "data.frame", row.names = c(NA, 
-9L))

I am trying to find those row that have specific strings in Accession2 column and then sum up the NO1

For example, I want to know F8WA69 and Q9BUA3 exist in which rows So It it will be

Groups  No  NO1 Accessions  Accessions2 NO3
1   8   1   Q6FI13  Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13  1
1   9   1   E9PCL5  B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0  0

and

Groups  No  NO1 Accessions  Accessions2 NO3
1   3   1   Q6P597-3    Q6P597-3; Q6P597-2; Q9BUA3  0
1   1   1   Q9BUA3  Q9BUA3  0
1   2   1   Q7Z406-6    Q7Z406-2; Q9BUA3    0

Then the sum up the NO1 for each of them

The first one is 2 and the second one is 3

pogibas · Accepted Answer

You can use simple grepl or grep to find rows where ID is present.

For example:

grepl("F8WA69", df$Accessions2)
[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

To subset your data use:

df[grepl("F8WA69", df$Accessions2), ]

And if you want to iterate over multiple ID's and sum NO1 you can use sapply:

sapply(c("F8WA69", "Q9BUA3"),
       function(x) sum(df[grepl(x, df$Accessions2), ]$NO1))
F8WA69 Q9BUA3 
     2      1

Get the rows that include specific strings

Answers (1)

Related Questions