Reputation: 8843
Try the following R code (with tidyverse loaded):
dice = data.frame(roll = sample(x = 1:6, size = 1000, replace = TRUE))
ones = dice %>% filter(roll == 1)
length(ones$roll)
mutate(
data.frame(n = 1:6),
len = length(filter(dice, roll == n)$roll))
The first 3 lines behave as expected, and give a sensible count for the number of ones rolled. The last line gives an error
1 1 152
2 2 152
3 3 152
4 4 152
5 5 152
6 6 152
Warning message:
In roll == n :
longer object length is not a multiple of shorter object length
What am I doing wrong? Is it picking up n as a vector instead of operating on individual values?
NB. I know this is not a sensible way to count the no. elements with each value. It's just a convenient problem to illustrate the issue, which occurs in a much messier example.
Thanks!
Upvotes: 1
Views: 1123
Reputation: 6441
dplyr
thinks columnwise not rowwise. That mean it doesn't evaluate n = c(1,2,3,4,5,6)
element after element, but all elements at once.
Doing:
mutate(
data.frame(n = 1:6),
len = length(filter(dice, roll == n)$roll))
I get
n len
1 1 164
2 2 164
3 3 164
4 4 164
5 5 164
6 6 164
Warning message:
In roll == n :
longer object length is not a multiple of shorter object length
Which is the same as:
sum(dice$roll == 1:6)
[1] 164
Warning message:
In dice$roll == 1:6 :
longer object length is not a multiple of shorter object length
Which compares two vectors at their position, recycling the shorter vector as often as necessary, giving a warning when the lenghts don't match.
If you put a rowwise()
in between it evaluates n
element after element:
data.frame(n = 1:6) %>% rowwise() %>% mutate(len = length(filter(dice, roll == n)$roll))
# A tibble: 6 x 2
n len
<int> <int>
1 1 172
2 2 159
3 3 176
4 4 168
5 5 174
6 6 151
Upvotes: 2