Reputation: 57
On the tidyverse website reference, I saw two usage mutate(mtcars, row_number() == 1L)
and mtcars %>% filter(between(row_number(), 1, 10))
. It would be straight forward to think that the row_number()
function is return the row number for each observation in the dataframe.
However, it has been emphasized in the documentation that the function is a window function and is similar to sortperm
in other languages. As in the example:
x <- c(5, 1, 3, 2, 2, NA)
row_number(x)
# [1] 5 1 4 2 3 NA
May I ask if this function is intended to report the row number for each observations? If it is, what is the logic flow behind the function call?
Thanks!
Upvotes: 2
Views: 253
Reputation: 48211
As ?row_number
says, row_number
is equivalent to rank(ties.method = "first")
, where rank
(see ?rank
) returns the sample ranks of the values in a vector and using "first"
results in a permutation with increasing values at each index set of ties:
row_number
# function (x)
# rank(x, ties.method = "first", na.last = "keep")
# <bytecode: 0x108538478>
# <environment: namespace:dplyr>
So,
x <- c(5, 1, 3, 2, 2, NA)
row_number(x)
# [1] 5 1 4 2 3 NA
rank(x, ties = "first", na.last = "keep") # I added na.last = "keep" to fully replicate row_number
# [1] 5 1 4 2 3 NA
since
sort(x)
# [1] 1 2 2 3 5
and we gave a lower rank to the first 2
due to ties = "first"
.
Now when we use simply row_number()
in filter
, mutate
calls, then indeed it seems to simply return a vector of row numbers, as can be found here.
Upvotes: 3