Reputation: 218

How to count how many times values appear in a dataframe per row in r?

I have a dataset of genes and each genes interacting genes. I have this in 2 columns like this:

Gene    Interacting Genes
ACE     BRCA2, NOS2, SEPT9
HER2    AGT, TGRF
YUO     SEPT9, NOS2

Separately I have a dataset which is just a list of genes. I am looking to creat a count column of how many interacting genes per gene are also in my second dataset. My second dataset looking like:

Gene
NOS2
SEPT9
QRTY

Output from this example would look like:

Gene   Interacting Genes     Count
ACE     BRCA2, NOS2, SEPT9    2
HER2    AGT,   TGRF           0
YUO     SEPT9                 1

#NOS2 and SEPT9 are in the gene list dataframe and so are counted

I've seen similar questions but not ones that are doing a count within a string per each row, this is the part I am stuck on.

Input data:

#df1:
structure(list(Gene = c("ACE", "HER2", "YUO"), interactors = c("BRCA2, NOS2, SEPT9", 
"AGT,   TGRF", 
"SEPT9,  NOS2"
)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
))

#df2:
structure(list(Gene = c("NOS2", "SEPT9", "QRTY")), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))

Upvotes: 0

Answers (2)

rjen

Reputation: 1982

You can use a solution based on dplyr and stringr.

library(dplyr)
library(stringr)

df1 %>%
  mutate(count = str_count(interactors, str_c(df2$Gene, collapse = '|')))

#   Gene        interactors count
# 1  ACE BRCA2, NOS2, SEPT9     2
# 2 HER2        AGT,   TGRF     0
# 3  YUO       SEPT9,  NOS2     2

Upvotes: 2

Karthik S

Reputation: 11546

Using str_extract_all:

> library(dplyr)
> library(stringr)
> df1 %>% mutate(counter = str_extract_all(interactors, paste0(df2$Gene, collapse = '|'))) %>% 
+     rowwise() %>% mutate(count = length(counter)) %>% select(-counter)
# A tibble: 3 x 3
# Rowwise: 
  Gene  interactors        count
  <chr> <chr>              <int>
1 ACE   BRCA2, NOS2, SEPT9     2
2 HER2  AGT,   TGRF            0
3 YUO   SEPT9,  NOS2           2
>

Upvotes: 0

How to count how many times values appear in a dataframe per row in r?

Answers (2)

Related Questions