Reputation: 488
My understanding was that dplyr::ntile
and statar::xtile
are trying to the same thing. But sometimes the output is different:
dplyr::ntile(1:10, 5)
# [1] 1 1 2 2 3 3 4 4 5 5
statar::xtile(1:10, 5)
# [1] 1 1 2 2 3 3 3 4 5 5
I am converting Stata code into R, so statar::xtile
gives the same output as the original Stata code but I thought dplyr::ntile
would be the equivalent in R.
The Stata help says that xtile is used to:
Create variable containing quantile categories
And statar::xtile
is obviously replicating this.
And dplyr::ntile
is:
a rough rank, which breaks the input vector into n buckets.
Do these mean the same thing?
If so, why do they give different answers?
And if not, then:
What is the difference?
When should you use one or the other?
Upvotes: 3
Views: 1826
Reputation: 488
Thanks @alistaire for pointing out that dplyr::ntile
is only doing:
function (x, n) { floor((n * (row_number(x) - 1)/length(x)) + 1) }
So not the same as splitting into quantile categories, as xtile
does.
Looking at the code for statar::xtile
leads to statar::pctile
and the documentation for statar
says that:
pctile computes quantile and weighted quantile of type 2 (similarly to Stata _pctile)
Therefore an equivalent to statar::xtile
in base R is:
.bincode(1:10, quantile(1:10, seq(0, 1, length.out = 5 + 1), type = 2),
include.lowest = TRUE)
# [1] 1 1 2 2 3 3 3 4 5 5
Upvotes: 4