Reputation: 933
Im trying to find the number of times a unique gene is found in samples with its respective pvalue
df1 <- read.table(text="
Gene id Seg.mean pValue CNA
Nfib 8410 0.3108 1.381913 gain
Mycl 8410 2.7320 1.182842 gain
Mycl 8410 2.7320 1.846275 gain
Nfib 8411 0.5920 1.381913 gain
Nfib 8411 1.3090 1.381913 gain
Mycl 8412 1.6150 5.765442 gain
Mycl 8411 1.6150 1.846275 gain
",header=TRUE)
expected output
Gene ID Freq. of id pValue
Nfib 8410,8411 2 1.381913
Mycl 8410,8411,8412 3 1.182842,1.846275,5.765442
Upvotes: 3
Views: 88
Reputation: 4335
library(plyr)
> ddply(data.frame(df1), .(Gene), summarise,ID=paste(unique(id), collapse=","),pValue=paste(unique(pValue), collapse=","),Freq = length(unique(id)))
Gene ID pValue Freq
1 Mycl 8410,8412,8411 1.182842,1.846275,5.765442 3
2 Nfib 8410,8411 1.381913 2
Upvotes: 1
Reputation: 7119
I think you can use data.table to get very close to the result you want to achieve:
library(data.table)
df1<-data.table(df1)
df1[,
list(ID = paste(unique(id), collapse=','),
"Freq. of id"=length(unique(id)),
pValue=paste(unique(pValue), collapse=",")),
keyby=list(Gene)]
Upvotes: 1
Reputation: 4194
library(dplyr)
df1 %>%
group_by(Gene) %>%
summarise(ID = paste0(unique(id), collapse=", "),
pval = paste0(unique(pValue),collapse=", "),
n = n_distinct(id))
Gene ID pval n
1 Mycl 8410, 8412, 8411 1.182842, 1.846275, 5.765442 3
2 Nfib 8410, 8411 1.381913 2
Gene
(unit of analysis) and so group_by(Gene)
. paste0(var,collapse=", ")
. This is applied per Gene
. Gene
. Upvotes: 2