Reputation: 23154
Here is some experiments in RStudio with an RMarkdown file:
---
title: "test"
author: "qed"
date: "10/10/2016"
output: html_document
---
```{r}
library(ISLR)
set.seed(3)
Wage$age = jitter(Wage$age)
get_breaks = function(cutted) {
labels = levels(cutted)
lower = as.numeric(sub("\\((.+),.*", "\\1", labels))
upper = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labels[length(labels)]))
c(lower, upper)
}
age_groups = cut(Wage$age, 4)
age_groups1 = cut(Wage$age, get_breaks(age_groups))
all(levels(age_groups) == levels(age_groups1))
idx = which(age_groups != age_groups1)
idx # not empty!
```
If you knitr it you will see that idx is not empty.
RStudio version 0.99.903
R version 3.3.1
Essentailly, I tried to extract the breaks from the output of the cut function and apply it explicitly. It's expected that the new output should be exactly the same with the old, but they are not.
Is this a bug? How to fix it?
Actually, after repeatedly trying this in the R console, the same problem turns out to exist there, too, so it's not an RStudio bug. The even more troubling thing is that the behavior doesn't seem deterministic in spite of set.seed
.
Upvotes: 0
Views: 109
Reputation: 30184
You think the two ways of cutting the vector are equivalent, but they are not. This issue is irrlevant to RStudio or knitr. It is easy to show the problem in a normal R session:
problem = function() {
library(ISLR)
set.seed(NULL) # reinitialize random seed
Wage$age.jittered = jitter(Wage$age)
get_breaks = function(cutted) {
labels = levels(cutted)
lower = as.numeric(sub("\\((.+),.*", "\\1", labels))
upper = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labels[length(labels)]))
c(lower, upper)
}
age_groups = cut(Wage$age.jittered, 4)
age_groups1 = cut(Wage$age.jittered, get_breaks(age_groups))
all(levels(age_groups) == levels(age_groups1))
idx = which(age_groups != age_groups1)
length(idx)
}
res = replicate(1000, problem())
barplot(table(res))
You'd expect the barplot to only have non-zero frequencies at 0, but the length of idx
is not zero for quite a few times.
Back to your question, the labels that you saw are not necessarily the exact endpoints. They could be rounded. See the argument dig.lab
in the help page ?cut
.
Upvotes: 1