Reputation: 784
I have a data that looks like this
GeneID Score
ABC 0.1
EFH 0.2
ABC 0.5
STY 0.1
TRQ 0.2
TRQ 0.1
EFH 0.5
EFH 0.1
EFH 0.01
And I want to get a frequency of column 1 over the bin range of column 2 as follows:
<=0.1 4
>0.1 and <=0.5 4
Even if there are redundant values in column 1.Also if a particular value in column 1 appears twice in the same range, how do I count it just once?
Upvotes: 2
Views: 260
Reputation: 92282
You don't need any ifelse
statements here, just use cut
table(droplevels(cut(df$Score, c(-Inf, .1, .5, Inf))))
# (-Inf,0.1] (0.1,0.5]
# 5 4
Though if Score
is bounded like in the provided data set, all you need to do is just to use table
by condition
setNames(table(df$Score > 0.1), c(" <= 0.1", "> 0.1"))
# <= 0.1 > 0.1
# 5 4
Upvotes: 1
Reputation: 9893
Assuming your data frame is called df
, here's what I'd do:
library(dplyr)
df <- df %>%
mutate(bin = ifelse(Score <= 0.1, "(,0.1]", ifelse(Score <= 0.5, "(0.1,0.5]", "(0.5,]"))) %>%
group_by(bin) %>%
summarise(N = n())
Which returns
Source: local data frame [2 x 2]
bin N
1 (,0.1] 5
2 (0.1,0.5] 4
Upvotes: 1
Reputation: 23
Should work with package plyr
ddply(data, .(GeneID), summarize, frequency = (length(GeneID)/nrow(data)),
range = max(Score)-min(Score))
Upvotes: 0