Reputation: 12074
I have a data.table
that contains a column called values
. I'd like to create some factors based on this column using cut
. Some intervals will have the same factor (i.e., NA
), others will not. For example,
# Set RNG seed
set.seed(-1)
# Load library
library(data.table)
# Create data table
dt <- data.table(values = runif(1000))
# Divide vector into groups
dt[, group := cut(values,
breaks = c(-Inf, 0.2, 0.4, 0.6, 0.8, Inf),
labels = c(NA, "foo", NA, "bar", NA))]
dt
#> Error in as.character.factor(x): malformed factor
Created on 2019-09-25 by the reprex package (v0.3.0)
As you can see, this produces an error:
Error in as.character.factor(x): malformed factor
When I do the cut
outside of data.table
, it seems to work fine:
# Set RNG seed
set.seed(-1)
# Load library
library(data.table)
# Create data table
dt <- data.table(values = runif(1000))
# Outside of data table
cut(dt$values,
breaks = c(-Inf, 0.2, 0.4, 0.6, 0.8, Inf),
labels = c(NA, "foo", NA, "bar", NA))
#> [1] <NA> <NA> <NA> <NA> foo <NA> foo <NA> foo foo bar foo foo
#> [14] <NA> foo <NA> <NA> <NA> foo bar foo <NA> foo bar foo foo
#> [27] foo <NA> <NA> <NA> <NA> bar <NA> <NA> bar bar foo foo foo
#> [40] <NA> <NA> <NA> foo <NA> <NA> <NA> <NA> foo <NA> foo bar bar
#> [53] <NA> foo <NA> <NA> foo <NA> foo <NA> foo <NA> <NA> <NA> <NA>
#> [66] foo foo <NA> bar bar <NA> <NA> <NA> foo bar bar <NA> <NA>
#> [79] <NA> <NA> foo bar bar bar bar bar <NA> bar <NA> <NA> <NA>
#> [92] <NA> <NA> <NA> <NA> <NA> foo <NA> foo foo foo <NA> <NA> <NA>
#> [105] foo <NA> foo <NA> bar <NA> <NA> <NA> foo bar <NA> bar foo
#> [118] foo <NA> <NA> <NA> <NA> <NA> <NA> bar <NA> <NA> <NA> <NA> <NA>
#> [131] <NA> foo <NA> <NA> <NA> bar <NA> <NA> foo foo <NA> <NA> foo
#> [144] <NA> <NA> <NA> <NA> <NA> <NA> <NA> foo bar bar <NA> <NA> <NA>
#> [157] <NA> foo <NA> <NA> foo bar bar foo <NA> <NA> <NA> foo <NA>
#> [170] <NA> <NA> bar <NA> <NA> <NA> <NA> <NA> foo foo <NA> <NA> foo
#> [183] <NA> <NA> <NA> foo bar <NA> foo <NA> bar foo <NA> <NA> bar
#> [196] foo <NA> <NA> foo bar <NA> <NA> bar <NA> <NA> bar bar <NA>
#> [209] <NA> bar bar bar <NA> <NA> foo bar <NA> bar <NA> bar foo
#> [222] bar <NA> <NA> foo bar bar bar foo <NA> bar <NA> <NA> <NA>
#> [235] <NA> bar <NA> foo foo foo <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [248] <NA> <NA> <NA> <NA> <NA> foo <NA> bar <NA> bar <NA> bar bar
#> [261] <NA> <NA> <NA> <NA> foo <NA> <NA> <NA> <NA> foo <NA> bar <NA>
#> [274] <NA> <NA> <NA> <NA> bar <NA> <NA> bar bar bar foo <NA> foo
#> [287] foo <NA> <NA> <NA> <NA> bar <NA> <NA> <NA> foo foo <NA> <NA>
#> [300] foo <NA> <NA> <NA> bar <NA> <NA> <NA> <NA> <NA> <NA> <NA> bar
#> [313] foo bar <NA> <NA> <NA> <NA> foo <NA> <NA> <NA> <NA> <NA> <NA>
#> [326] <NA> <NA> foo bar <NA> foo bar <NA> bar bar <NA> <NA> bar
#> [339] <NA> <NA> <NA> <NA> <NA> <NA> bar foo <NA> <NA> <NA> bar <NA>
#> [352] bar foo <NA> foo <NA> <NA> foo <NA> <NA> <NA> bar <NA> foo
#> [365] foo <NA> <NA> <NA> bar <NA> <NA> <NA> bar foo foo foo <NA>
#> [378] <NA> <NA> <NA> <NA> foo <NA> <NA> <NA> foo <NA> bar bar <NA>
#> [391] bar bar <NA> foo <NA> bar <NA> bar <NA> foo <NA> foo foo
#> [404] <NA> <NA> <NA> <NA> <NA> foo foo bar <NA> bar foo <NA> foo
#> [417] <NA> bar <NA> <NA> foo <NA> <NA> <NA> <NA> <NA> bar foo bar
#> [430] <NA> <NA> bar foo <NA> <NA> <NA> <NA> <NA> <NA> foo <NA> <NA>
#> [443] <NA> foo <NA> bar <NA> foo foo bar <NA> <NA> <NA> bar <NA>
#> [456] foo <NA> <NA> <NA> <NA> foo <NA> <NA> bar foo foo <NA> <NA>
#> [469] <NA> <NA> bar <NA> foo foo <NA> <NA> <NA> <NA> foo <NA> <NA>
#> [482] bar foo bar <NA> <NA> foo <NA> foo foo <NA> <NA> <NA> <NA>
#> [495] foo <NA> <NA> <NA> <NA> foo foo bar foo <NA> <NA> <NA> <NA>
#> [508] <NA> <NA> <NA> <NA> <NA> <NA> foo <NA> foo bar bar <NA> foo
#> [521] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> foo <NA> bar bar foo
#> [534] <NA> foo foo bar <NA> <NA> <NA> bar <NA> <NA> foo bar bar
#> [547] <NA> <NA> <NA> bar foo bar <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [560] <NA> <NA> bar <NA> <NA> <NA> foo <NA> <NA> <NA> <NA> <NA> <NA>
#> [573] foo <NA> foo <NA> bar foo foo bar <NA> <NA> <NA> <NA> bar
#> [586] foo <NA> foo <NA> bar <NA> <NA> foo <NA> <NA> <NA> <NA> <NA>
#> [599] foo <NA> <NA> foo <NA> bar foo <NA> <NA> <NA> bar <NA> bar
#> [612] foo foo bar <NA> <NA> bar bar foo bar <NA> <NA> <NA> bar
#> [625] <NA> foo <NA> bar <NA> <NA> <NA> <NA> foo bar bar <NA> foo
#> [638] <NA> bar <NA> <NA> <NA> foo <NA> foo bar <NA> bar <NA> <NA>
#> [651] <NA> <NA> bar foo <NA> <NA> bar <NA> foo foo foo <NA> foo
#> [664] <NA> foo <NA> <NA> <NA> <NA> <NA> <NA> <NA> bar <NA> <NA> <NA>
#> [677] foo <NA> <NA> bar bar <NA> foo <NA> <NA> <NA> <NA> <NA> bar
#> [690] <NA> <NA> foo bar foo <NA> <NA> <NA> bar foo bar <NA> bar
#> [703] <NA> <NA> foo <NA> <NA> bar <NA> <NA> foo <NA> <NA> <NA> bar
#> [716] foo bar <NA> foo bar <NA> <NA> <NA> bar <NA> <NA> <NA> bar
#> [729] <NA> foo foo <NA> <NA> bar <NA> bar foo <NA> <NA> <NA> <NA>
#> [742] bar <NA> <NA> foo foo <NA> <NA> <NA> <NA> <NA> <NA> bar foo
#> [755] <NA> foo <NA> <NA> <NA> <NA> bar foo <NA> <NA> <NA> foo bar
#> [768] bar <NA> <NA> <NA> <NA> <NA> bar foo foo bar <NA> <NA> bar
#> [781] foo foo <NA> <NA> foo foo bar <NA> foo bar <NA> foo <NA>
#> [794] foo <NA> bar <NA> foo foo <NA> <NA> bar foo <NA> foo <NA>
#> [807] <NA> <NA> <NA> <NA> bar foo <NA> foo foo bar <NA> bar <NA>
#> [820] <NA> bar bar <NA> bar <NA> <NA> foo bar <NA> <NA> <NA> bar
#> [833] <NA> foo foo <NA> foo <NA> <NA> <NA> <NA> bar foo bar bar
#> [846] bar <NA> <NA> <NA> foo bar foo <NA> <NA> bar <NA> foo <NA>
#> [859] <NA> foo <NA> <NA> bar bar bar <NA> foo <NA> <NA> <NA> <NA>
#> [872] foo <NA> <NA> foo <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [885] <NA> <NA> bar bar <NA> <NA> <NA> <NA> foo <NA> bar <NA> <NA>
#> [898] <NA> bar <NA> <NA> <NA> <NA> <NA> foo foo <NA> <NA> <NA> foo
#> [911] <NA> bar bar bar bar bar <NA> <NA> bar foo bar <NA> <NA>
#> [924] <NA> <NA> <NA> foo bar bar bar foo <NA> <NA> foo foo <NA>
#> [937] bar <NA> <NA> bar <NA> bar <NA> <NA> <NA> bar <NA> <NA> <NA>
#> [950] bar <NA> foo <NA> <NA> foo bar <NA> <NA> <NA> <NA> <NA> <NA>
#> [963] foo <NA> foo <NA> <NA> <NA> <NA> <NA> foo <NA> bar foo <NA>
#> [976] bar bar <NA> bar <NA> foo <NA> <NA> foo <NA> <NA> bar foo
#> [989] <NA> <NA> <NA> bar foo bar foo bar <NA> <NA> bar <NA>
#> Levels: <NA> foo bar
Created on 2019-09-25 by the reprex package (v0.3.0)
Why is data.table
getting upset?
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.12.2
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.6.1 magrittr_1.5 tools_3.6.1 htmltools_0.3.6
#> [5] yaml_2.2.0 Rcpp_1.0.2 stringi_1.4.3 rmarkdown_1.15
#> [9] highr_0.8 knitr_1.25 stringr_1.4.0 xfun_0.9
#> [13] digest_0.6.21 evaluate_0.14
Upvotes: 1
Views: 518
Reputation: 12074
Here's a solution based on @sindri_baldur's insights above.
# Set RNG seed
set.seed(-1)
# Load library
library(data.table)
# Create data table
dt <- data.table(values = runif(1000))
# Divide vector into groups
dt[, group := factor(cut(values,
breaks = c(-Inf, 0.2, 0.4, 0.6, 0.8, Inf),
labels = c(NA, "foo", NA, "bar", NA)))]
dt
#> values group
#> 1: 0.4866672 <NA>
#> 2: 0.1913653 <NA>
#> 3: 0.9932719 <NA>
#> 4: 0.1467027 <NA>
#> 5: 0.2415895 foo
#> ---
#> 996: 0.6428781 bar
#> 997: 0.4525126 <NA>
#> 998: 0.9631253 <NA>
#> 999: 0.7285391 bar
#> 1000: 0.1713554 <NA>
Created on 2019-09-26 by the reprex package (v0.3.0)
factor
by default omits NA
when creating levels, which seems to make data.table
happy.
This issue was resolved by bug fix #45 of v1.12.4, as detailed here.
Upvotes: 1