AishwaryaKulkarni
AishwaryaKulkarni

Reputation: 784

Frequency of a column over a range of another column in R

I have a data that looks like this

GeneID Score
ABC     0.1
EFH     0.2
ABC     0.5
STY     0.1
TRQ     0.2
TRQ     0.1
EFH     0.5
EFH     0.1
EFH     0.01

And I want to get a frequency of column 1 over the bin range of column 2 as follows:

<=0.1             4
>0.1 and <=0.5    4

Even if there are redundant values in column 1.Also if a particular value in column 1 appears twice in the same range, how do I count it just once?

Upvotes: 2

Views: 260

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

You don't need any ifelse statements here, just use cut

table(droplevels(cut(df$Score, c(-Inf, .1, .5, Inf))))
# (-Inf,0.1]  (0.1,0.5] 
#          5          4 

Though if Score is bounded like in the provided data set, all you need to do is just to use table by condition

setNames(table(df$Score > 0.1), c(" <= 0.1", "> 0.1"))
# <= 0.1   > 0.1 
#      5       4 

Upvotes: 1

rrs
rrs

Reputation: 9893

Assuming your data frame is called df, here's what I'd do:

library(dplyr)

df <- df %>%
  mutate(bin = ifelse(Score <= 0.1, "(,0.1]", ifelse(Score <= 0.5, "(0.1,0.5]", "(0.5,]"))) %>%
  group_by(bin) %>%
  summarise(N = n())

Which returns

Source: local data frame [2 x 2]

        bin N
1    (,0.1] 5
2 (0.1,0.5] 4

Upvotes: 1

Dan Kehila
Dan Kehila

Reputation: 23

Should work with package plyr

ddply(data, .(GeneID), summarize, frequency = (length(GeneID)/nrow(data)),
                                        range = max(Score)-min(Score))

Upvotes: 0

Related Questions