Reputation: 1191
I need help trying to figure out how to use R
to determine if there a difference in the relative frequencies of my data set. i keep reading bout different possibilities but I'm not sure if I'm doing it right. What I want to know is whether the values under the "Total.Clusters" column for the V13 and V35 gene.fragment are significantly different from whole gene.fragment value. This is what my data looks like, I have 9700 data points:
Total.Clusters Singleton.clusters >1seq.clusters gene.fragment algorithm
5427 3767 1660 whole uclust
5929 4277 1652 V13 uclust
3911 2312 1599 V35 uclust
To test normality, would I do the following in R:
data1<-read.csv(file.choose())
x<-data1[,c(1)])
shapiro.test(x)
##
## Shapiro-Wilk normality test
## data: x
## W = 0.9224, p-value = 0.4607`
So since the "Total.Clusters" column is normal, could I use a t.test to compare the whole
gene/fragment value and the V13 and V35 values?
I'm just not sure how to do this, because I've tried different things but I'm not sure which is the right way to do it.
edit: So in essence, I'm trying to figure out if 5427 and 5929 are significantly different from each other, and whether 5427 and 3911 are significantly different from each other.
EDIT:
I realized how this question didn't make much sense. I went with different data and ended up using the chisq.test() function in R
Upvotes: 0
Views: 82
Reputation: 8488
So in essence, I'm trying to figure out if 5427 and 5929 are significantly different from each other, and whether 5427 and 3911 are significantly different from each other.
That doesn't make sense; you can't test significant differences between single numbers. What you can test is whether there is a significant difference among the distributions of whole
, V13
and V35
. You can do so with pairwise.t.test
:
pairwise.t.test(data1$Total.Clusters, data1$gene.fragment, p.adjust.method="none")
Check out ?pairwise.t.test
for multiple comparisons options.
Upvotes: 2