WindSur
WindSur

Reputation: 140

Number of occurences in a dataframe

I've the following data frame and I want to count the occurrences of each row by the first column and append as another column say "freq" to the data frame:

df:

gene    a    b    c
abc     1    NA   1
bca     NA   1    1
cba     1    2    1

my df is bigger, so this is only an example to scalable.

The desire dataframe is that:

gene    a    b    c    freq
abc     1    NA   1     2
bca     NA   1    1     2
cba     1    2    1     3

the codes what I have tried is that:

g <- df %>% mutate(numtwos = rowSums(. > 0))

or

df$freq <- apply(df , 1, function(x) length(which(x>0)))

But it is not working because if in a row should have (for example) 150 repetitions, I obtain only 2 for every row.

Any help or other point of view is welcome!

Thanks

Upvotes: 1

Views: 40

Answers (2)

akrun
akrun

Reputation: 886938

We can use first convert the Na to "NA"

library(dplyr)
df %>%
   mutate_at(vars(a:c), ~ as.numeric(na_if(., "Na"))) %>%
   mutate(freq = rowSums(select(., a:c), na.rm = TRUE))
#  gene  a  b c freq
#1  abc  1 NA 1    2
#2  bca NA  1 1    2
#3  cba  1  1 1    3

Here, the values are all 1s, so it is the same as getting the sum of non-NA

df %>%
   mutate_at(vars(a:c), ~ as.numeric(na_if(., "Na"))) %>%
   mutate(freq = rowSums(!is.na(select(., a:c))))

data

df <- structure(list(gene = c("abc", "bca", "cba"), a = c("1", "Na", 
"1"), b = c("Na", "1", "1"), c = c(1L, 1L, 1L)), 
class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 2

Matt Kowaleczko
Matt Kowaleczko

Reputation: 53

I haven't used R for a while, so I won't paste in the code, but you can create a new df groupping the initial one by gene and merge/join it to your initial df in another line of code.

Upvotes: 0

Related Questions