Reputation: 2090
I guess there are several ways to do this. Hence, the answers to this question could be subjective, if not opiniated. So I will try to narrow the problem, and give you the details of what I have already done.
I am working with the R6
package and I have created an IntervalNumeric
R6Class which has two fields lower_bound
and upper_bound
:
require(R6)
NumericInterval <-
R6Class(
"NumericInterval",
public = list(
lower_bound = NA,
upper_bound = NA,
initialize = function(low, up) {
self$lower_bound <- low
self$upper_bound <- up
},
as_character = function() {
paste0("[", self$lower_bound, ", ",
self$upper_bound, "]")}))
I have also use the S3
generic method system to get an as.character
and print
for the
NumericInterval
type:
as.character.NumericInterval <- function(x, ...) {
x$as_character()}
print.NumericInterval <- function(x, ...) {
x$as_character()}
Now I can do this (and the same with print
):
> as.character(NumericInterval$new(0, pi))
[1] "[0, 3.14159265358979]"
What is needed to do now to use this new type as a data.frame
column type?
For example I want to be able to do this:
(df <- data.frame(
X = c("I1", "I2", "I3"),
Y = c(NumericInterval$new(0,1),
NumericInterval$new(1,2),
NumericInterval$new(2,3)))
and get :
X Y
1 I1 [0, 1]
2 I2 [1, 2]
3 I3 [2, 3]
but I get :
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘c("NumericInterval", "R6")’ to a data.frame
Of course I want also to be able to access objects and do things like:
df[2, 2]$lower_bound <- 0
tibble
s seem to be a solution(df <- tibble(
X = c("I1", "I2", "I3"),
Y = c(NumericInterval$new(0,1),
NumericInterval$new(1,2),
NumericInterval$new(2,3))))
produces:
# A tibble: 3 x 2
X Y
<chr> <list>
1 I1 <NmrcIntr>
2 I2 <NmrcIntr>
3 I3 <NmrcIntr>
And each NumericInterval
is placed as expected eg:
> require(dplyr)
> df[2,1][[1]] %>% pull
[[1]]
<NumericInterval>
Public:
as_character: function ()
clone: function (deep = FALSE)
initialize: function (low, up)
lower_bound: 0
upper_bound: 1
But the output of the tibble and the way to access to the object is not what I expect.
Upvotes: 4
Views: 461
Reputation: 173858
There are some design decisions you need to make before you can coerce R6 objects into a data frame. Perhaps the most important is at what level you want vectorization to occur.
In your example you have "atomic" NumericInterval
s which you put into a vector, and this certainly has some advantages, but the major disadvantage is that when you try to use base R vector functions on a collection of NumericInterval
s, R treats the objects as environments (which is what R6 objects are). That means you will not get the kind of behaviour you are looking for, because you want R to handle a vector of these environments differently from how it normally handles a vector of environments. In other words, to handle this way of working, you need to define another class with methods to manage the vector operations. This is possible, but seems complicated, messy and inefficient.
It seems to me that it would be better to keep the vectorization within a single R6 object - that is, to have vectors of lower_bounds
and upper_bounds
within a single R6 object. The R6 class can be made to handle printing, formatting and subsetting, and can act as an entire column in a data frame itself.
To do all this, you first need to define some R6 specializations of generic functions:
`[.R6` <- function(x, ...) x$`[`(...)
`[<-.R6` <- function(x, ...) x$`[<-`(...)
length.R6 <- function(x) x$length()
format.R6 <- function(x) x$format()
as.data.frame.R6 <- function(x, ...) x$as.data.frame()
Having these as .R6
rather than NumericInterval
allows you to use them in multiple different classes.
Now we can define our class with the specializations we need:
NumericInterval <- R6Class("NumericInterval",
public = list(
lower_bound = NA,
upper_bound = NA,
initialize = function(low, up) {
self$lower_bound <- low
self$upper_bound <- up
},
`[` = function(n){
return(NumericInterval$new(self$lower_bound[n], self$upper_bound[n]))
},
`[<-` = function(n, m){
self$lower_bound[n] <- m[1]
self$upper_bound[n] <- m[2]
invisible(self)
},
length = function() length(self$lower_bound),
as.data.frame = function(...) {
structure(
list(interval = structure(a)),
class = "data.frame",
row.names = seq_along(self$lower_bound))
},
as_character = function() {
paste0("[", self$lower_bound, ", ",
self$upper_bound, "]")},
format = function(...) self$as_character(),
print = function() {
print(self$as_character(), quote = FALSE)
invisible(self)}))
Which produces the following behaviour:
a <- NumericInterval$new(1:3, 4:6)
a
#> [1] [1, 4] [2, 5] [3, 6]
as.data.frame(a)
#> interval
#> 1 [1, 4]
#> 2 [2, 5]
#> 3 [3, 6]
df <- data.frame(id = LETTERS[1:3], interval = a)
df
#> id interval
#> 1 A [1, 4]
#> 2 B [2, 5]
#> 3 C [3, 6]
df[1,]
#> id interval
#> 1 A [1, 4]
df$interval[1]$lower_bound
#> [1] 1
This is of course not production-level code. In particular, you would need to include error handling to ensure that the upper and lower bounds are the same length, and are both numeric.
Upvotes: 5