Reputation: 15
I have a question about whether I can do a Wilcoxon test in a loop for all the table generated.
Basically, I want to do a paired Wilcoxon test between 2 variables for each dataset, and the 2 variables are in the same position(like xth and yth column) for every dataset. (For people who are familiar with Biology, in fact this is the RPKM values for like between control and treated sample for some repetitive elements) And I hope I can generate a table for the p-value from Wilcoxon test for each dataset.
I ready generated all the tables/dataset/dataframe using the below code and I think I want to do a Wilcoxon test for each dataset so I think I need to continue with the loop but i don't know how to do it:
data=sample_vs_norm
filter=unique(data$family)
for(i in 1:length(filter)){
table_name=paste('table_', filter[i], sep="")
print(table_name)
assign(table_name, data[data$Subfamily == filter[i]])
here is the structure of a single dataset: so basically i would like to do a Wilcoxon test between the variables "R009_initial_filter_rpkm" and "normal_filter_rpkm"
Chr Start End Mappability Strand R009_initial_filter_NormalizedCounts
1: chr11 113086868 113087173 1 - 2
2: chr2 24290845 24291132 1 - 11
3: chr4 15854425 15854650 1 - 0
4: chr6 43489623 43489676 1 + 11
normal_filter_NormalizedCounts R009_initial_filter_rpkm
1: 14.569000 0.169752
2: 1.000000 0.992191
3: 14.815900 0.000000
4: 0.864262 5.372810
normal_filter_rpkm FoldChange p.value FDR FoldChangeFPKM
1: 1.236560 0.137278 0.999862671 1.000000000 0.1372776
2: 0.000000 11.000000 0.003173828 0.008149271 Inf
3: 1.704630 0.000000 1.000000000 1.000000000 0.0000000
4: 0.422137 12.727600 0.003173828 0.008149271 12.7276453
structure(list(Chr = structure(1:4, .Label = c("chr11", "chr2",
"chr4", "chr6"), class = "factor"), Start = c(113086868L, 24290845L,
15854425L, 43489623L), End = c(113087173L, 24291132L, 15854650L,
43489676L), Mappability = c(1L, 1L, 1L, 1L), Strand = structure(c(1L,
1L, 1L, 2L), .Label = c("-", "+"), class = "factor"), R009_initial_filter_NormalizedCounts = c(2L,
11L, 0L, 11L), normal_filter_NormalizedCounts = c(14.569,
1, 14.8159, 0.864262), R009_initial_filter_rpkm = c(0.169752,
0.992191, 0, 5.37281), normal_filter_rpkm = c(1.23656,
0, 1.70463, 0.422137), FoldChange = c(0.137278, 11, 0, 12.7276
), p.value = c(0.999862671, 0.003173828, 1, 0.003173828), FDR = c(1,
0.008149271, 1, 0.008149271), FoldChangeFPKM = c(0.1372776, Inf,
0, 12.7276453), class = "data.frame", row.names = c(NA,
-4L))
I'm sorry if I use incorrect terminology as I am a newbie in R, and thank you so much for the help
Upvotes: 1
Views: 708
Reputation: 24848
One approach is to use grouping with by =
in data.table
.
library(data.table)
setDT(data)
data[,wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
You can group by any number of variables, for example TE_Subfamily
and Chr
:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = .(TE_Subfamily,Chr)]
# TE_Subfamily Chr statistic p.value
#1: AluYf4 chr11 0 1
#2: AluYf4 chr2 1 1
#3: AluYf4 chr4 0 1
#4: AluYf4 chr6 1 1
If you need to only perform comparisons for certain TE_Subfamily
, you could try something like this:
filter <- c("AluYf4")
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily]
# TE_Subfamily statistic p.value
#1: AluYf4 7.5 1
For bonus points, you can correct for multiple testing:
data[TE_Subfamily %in% filter,
wilcox.test(R009_initial_filter_rpkm,
normal_filter_rpkm)[c("statistic","p.value")],
by = TE_Subfamily][,adjusted.p.value := p.adjust(p.value,method = "bonferroni")][]
Upvotes: 1