Reputation: 184
I've looked through the related dplyr questions, the R documentation, and attempted to sort through what I believe is a syntax misunderstanding.
Here is sample data that reflects the strx of my data.
id <- c(1:20)
xvar <- seq(from=2.0, to=6.0, length.out=100)
yvar <- c(1:100)
binary <- sample(x=c(0,1), size=100, replace=TRUE)
breaks <- c(0,11,21,31,41,51,61,71,81,91,100)
df <- data.frame(id, xvar, yvar, binary)
df <- transform(df, bin=cut(yvar, breaks))
id xvar yvar binary bin
1 1 2.000000 1 1 (0,11]
2 2 2.040404 2 0 (0,11]
3 3 2.080808 3 0 (0,11]
4 4 2.121212 4 0 (0,11]
5 5 2.161616 5 1 (0,11]
6 6 2.202020 6 0 (0,11]
I'd like to run the following, looking at how the xvar
means, divided by the binary
variable, are significantly different based on the bin
group they belong to.
pval <- df %>% group_by(bin) %>% summarise(p.value=t.test(xvar ~ factor(binary))$p.value)
However, I continue to get the error: "grouping factor must have exactly 2 levels"
I saw a similar post to this, but the problem was how the T.test was being run. I've ran this same code using a different group_by
object and it worked just fine. The data time was a factor and everything.
Any thoughts? I also would appreciate critiques on how to improve the manner in which this question was posed.
Upvotes: 1
Views: 997
Reputation: 184
I think I've resolved the issue.
"Grouping factor must have exactly 2 levels" comes from whenever there is not enough data in the t.test. I just assumed my original data set, which is large, would have enough to not run into this issue.
When I made the sample data more robust, the error disappeared.
Sorry for the wasted time, and thank you for your help!
Upvotes: 1
Reputation: 57686
You don't want to use dplyr for this. You want to fit a linear model.
mod <- lm(xvar ~ binary*bin, data=df)
anova(mod)
For further discussion of what the coefficients, P-values and sums of squares mean, consider asking on stats.SE.
Upvotes: 1