Reputation: 2592
In R
and using dplyr
, I need to cut
values in one one column using non-constant (not unique) breaks
, these being defined, for each row, by values in other columns of a same data frame. Hence I use rowwise
. While the breaks
seem to be functionally working (i.e. updated for each line), the labels
do not appear to be consistent.
For example:
library(dplyr)
set.seed(10)
myDF = data.frame(a=runif(5, min=0.3, max=0.7),
bmin = rep(0, 5),
bmid = c(0.5, 0.3, 0.6, 0.7, 0.4),
bmax = rep(1, 5))
myDF %>% rowwise() %>% mutate(grp1 = cut(a, breaks=c(bmin, bmid, bmax)),
grp2 = cut(a, breaks=c(bmin, bmid, bmax),
labels=c(paste(bmin, bmid, sep='-'),
paste(bmid, bmax, sep='-'))),
grp3 = cut(a, breaks=c(bmin, bmid, bmax),
labels=c(1, 2)))
# a bmin bmid bmax grp1 grp2 grp3
# (dbl) (dbl) (dbl) (dbl) (fctr) (fctr) (fctr)
# 1 0.3901746 0 0.5 1 (0,0.5] 0-0.5 1
# 2 0.4098122 0 0.3 1 (0.5,1] 0.5-1 2
# 3 0.4089220 0 0.6 1 (0,0.5] 0-0.5 1
# 4 0.5463317 0 0.7 1 (0,0.5] 0-0.5 1
# 5 0.4718686 0 0.4 1 (0.5,1] 0.5-1 2
In this example, one can see e.g. on line 2 that the cut
is functionnally correct (i.e., value 0.3
was properly used as cut point bmid
instead of 0.5
from the first line), but the resulting label is wrong (i.e. (0.5, 1]
does actually not contain the value 0.4098122
and 0.5
was indeed not the cut point).
grp2
is an attempt to manually set the labels, failing too, meaning that a manual breaks
-independent solution as in grp3
appears to be the only way forward...
In short, rowwise
does not seem to apply to the labels, but well to the cut points...
Am I missing anything, or is this a wrong behaviour? How can I label my intervals on a rowwise basis?
Upvotes: 3
Views: 330
Reputation: 206167
The problem is that you are trying to build a factor column where each row has different levels/labels. This is not really possible with factors. The mutate
seems to be trying to harmonize all the factor labels for you which is producing this odd effect. It's not unique to cut()
see also
data.frame(z=c("a","b","c")) %>% rowwise() %>% mutate(g=factor(z))
# z g
# (fctr) (fctr)
# 1 a a
# 2 b a
# 3 c a
One workaround would be to return character values rather than factor values.
myDF %>% rowwise() %>% mutate(grp1 = as.character(cut(a, breaks=c(bmin, bmid, bmax))))
Upvotes: 6