Is there a difference in the relative frequencies? Using R

Question

I need help trying to figure out how to use R to determine if there a difference in the relative frequencies of my data set. i keep reading bout different possibilities but I'm not sure if I'm doing it right. What I want to know is whether the values under the "Total.Clusters" column for the V13 and V35 gene.fragment are significantly different from whole gene.fragment value. This is what my data looks like, I have 9700 data points:

Total.Clusters  Singleton.clusters >1seq.clusters   gene.fragment   algorithm
5427              3767             1660             whole           uclust
5929              4277             1652             V13             uclust
3911              2312             1599             V35             uclust

To test normality, would I do the following in R:

data1<-read.csv(file.choose())    
x<-data1[,c(1)])    
shapiro.test(x)
## 
## Shapiro-Wilk normality test
## data:  x
## W = 0.9224, p-value = 0.4607`

So since the "Total.Clusters" column is normal, could I use a t.test to compare the whole gene/fragment value and the V13 and V35 values?

I'm just not sure how to do this, because I've tried different things but I'm not sure which is the right way to do it.

edit: So in essence, I'm trying to figure out if 5427 and 5929 are significantly different from each other, and whether 5427 and 3911 are significantly different from each other.

EDIT:

I realized how this question didn't make much sense. I went with different data and ended up using the chisq.test() function in R

Juli&#225;n Urbano · Accepted Answer

So in essence, I'm trying to figure out if 5427 and 5929 are significantly different from each other, and whether 5427 and 3911 are significantly different from each other.

That doesn't make sense; you can't test significant differences between single numbers. What you can test is whether there is a significant difference among the distributions of whole, V13 and V35. You can do so with pairwise.t.test:

pairwise.t.test(data1$Total.Clusters, data1$gene.fragment, p.adjust.method="none")

Check out ?pairwise.t.test for multiple comparisons options.

Is there a difference in the relative frequencies? Using R

Answers (1)

Related Questions