asshah4
asshah4

Reputation: 184

Error in group_by function in dplyr

I've looked through the related dplyr questions, the R documentation, and attempted to sort through what I believe is a syntax misunderstanding.

Here is sample data that reflects the strx of my data.

id <- c(1:20)
xvar <- seq(from=2.0, to=6.0, length.out=100)
yvar <- c(1:100)
binary <- sample(x=c(0,1), size=100, replace=TRUE)

breaks <- c(0,11,21,31,41,51,61,71,81,91,100)
df <- data.frame(id, xvar, yvar, binary)
df <- transform(df, bin=cut(yvar, breaks)) 

     id     xvar yvar binary    bin
1  1 2.000000    1      1 (0,11]
2  2 2.040404    2      0 (0,11]
3  3 2.080808    3      0 (0,11]
4  4 2.121212    4      0 (0,11]
5  5 2.161616    5      1 (0,11]
6  6 2.202020    6      0 (0,11]

I'd like to run the following, looking at how the xvar means, divided by the binary variable, are significantly different based on the bin group they belong to.

pval <- df %>% group_by(bin) %>% summarise(p.value=t.test(xvar ~ factor(binary))$p.value)

However, I continue to get the error: "grouping factor must have exactly 2 levels"

I saw a similar post to this, but the problem was how the T.test was being run. I've ran this same code using a different group_by object and it worked just fine. The data time was a factor and everything.

Any thoughts? I also would appreciate critiques on how to improve the manner in which this question was posed.

Upvotes: 1

Views: 997

Answers (2)

asshah4
asshah4

Reputation: 184

I think I've resolved the issue.

"Grouping factor must have exactly 2 levels" comes from whenever there is not enough data in the t.test. I just assumed my original data set, which is large, would have enough to not run into this issue.

When I made the sample data more robust, the error disappeared.

Sorry for the wasted time, and thank you for your help!

Upvotes: 1

Hong Ooi
Hong Ooi

Reputation: 57686

You don't want to use dplyr for this. You want to fit a linear model.

mod <- lm(xvar ~ binary*bin, data=df)
anova(mod)

For further discussion of what the coefficients, P-values and sums of squares mean, consider asking on stats.SE.

Upvotes: 1

Related Questions