pietrodito
pietrodito

Reputation: 2090

How to create a new type which plays well inside data.frame?

I guess there are several ways to do this. Hence, the answers to this question could be subjective, if not opiniated. So I will try to narrow the problem, and give you the details of what I have already done.

Context

I am working with the R6 package and I have created an IntervalNumeric R6Class which has two fields lower_bound and upper_bound:

require(R6)
NumericInterval <-
  R6Class(
        "NumericInterval",
        public = list(
          lower_bound = NA,
          upper_bound = NA,
          initialize = function(low, up) {
            self$lower_bound <- low
            self$upper_bound <- up
          },
          as_character = function() {
            paste0("[", self$lower_bound, ", ",
                        self$upper_bound, "]")}))

I have also use the S3 generic method system to get an as.character and printfor the NumericInterval type:

as.character.NumericInterval <- function(x, ...) {
  x$as_character()}
print.NumericInterval <- function(x, ...) {
  x$as_character()}

Now I can do this (and the same with print):

> as.character(NumericInterval$new(0, pi))

[1] "[0, 3.14159265358979]"

Question:

What is needed to do now to use this new type as a data.frame column type?

For example I want to be able to do this:

(df <- data.frame(
   X = c("I1", "I2", "I3"),
   Y = c(NumericInterval$new(0,1),
         NumericInterval$new(1,2),
         NumericInterval$new(2,3)))

and get :

   X      Y
1 I1 [0, 1]
2 I2 [1, 2]
3 I3 [2, 3]

but I get :

Error in as.data.frame.default(x[[i]], optional = TRUE) :
  cannot coerce class ‘c("NumericInterval", "R6")’ to a data.frame

Of course I want also to be able to access objects and do things like:

df[2, 2]$lower_bound <- 0

tibbles seem to be a solution

(df <- tibble(
X = c("I1", "I2", "I3"),
Y = c(NumericInterval$new(0,1),
NumericInterval$new(1,2),
NumericInterval$new(2,3))))

produces:

# A tibble: 3 x 2
  X     Y
  <chr> <list>
1 I1    <NmrcIntr>
2 I2    <NmrcIntr>
3 I3    <NmrcIntr>

And each NumericInterval is placed as expected eg:

> require(dplyr)
> df[2,1][[1]] %>% pull


[[1]]
<NumericInterval>
  Public:
    as_character: function ()
    clone: function (deep = FALSE)
    initialize: function (low, up)
    lower_bound: 0
    upper_bound: 1

But the output of the tibble and the way to access to the object is not what I expect.

Upvotes: 4

Views: 461

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173858

There are some design decisions you need to make before you can coerce R6 objects into a data frame. Perhaps the most important is at what level you want vectorization to occur.

In your example you have "atomic" NumericIntervals which you put into a vector, and this certainly has some advantages, but the major disadvantage is that when you try to use base R vector functions on a collection of NumericIntervals, R treats the objects as environments (which is what R6 objects are). That means you will not get the kind of behaviour you are looking for, because you want R to handle a vector of these environments differently from how it normally handles a vector of environments. In other words, to handle this way of working, you need to define another class with methods to manage the vector operations. This is possible, but seems complicated, messy and inefficient.

It seems to me that it would be better to keep the vectorization within a single R6 object - that is, to have vectors of lower_bounds and upper_bounds within a single R6 object. The R6 class can be made to handle printing, formatting and subsetting, and can act as an entire column in a data frame itself.

To do all this, you first need to define some R6 specializations of generic functions:

`[.R6` <- function(x, ...) x$`[`(...) 
`[<-.R6` <- function(x, ...) x$`[<-`(...)
length.R6 <- function(x) x$length()
format.R6 <- function(x) x$format()
as.data.frame.R6 <- function(x, ...) x$as.data.frame()

Having these as .R6 rather than NumericInterval allows you to use them in multiple different classes.

Now we can define our class with the specializations we need:

NumericInterval <- R6Class("NumericInterval",
        public = list(
          lower_bound = NA,
          upper_bound = NA,
          initialize = function(low, up) {
            self$lower_bound <- low
            self$upper_bound <- up
          },
          `[` = function(n){
            return(NumericInterval$new(self$lower_bound[n], self$upper_bound[n]))
          },
          `[<-` = function(n, m){
            self$lower_bound[n] <- m[1]
            self$upper_bound[n] <- m[2]
            invisible(self)
          },
          length = function() length(self$lower_bound), 
          as.data.frame = function(...) {
            structure(
              list(interval = structure(a)), 
              class = "data.frame", 
              row.names = seq_along(self$lower_bound))
          },
          as_character = function() {
            paste0("[", self$lower_bound, ", ",
                        self$upper_bound, "]")},
          format = function(...) self$as_character(),
          print = function() {
            print(self$as_character(), quote = FALSE)
            invisible(self)}))

Which produces the following behaviour:

a <- NumericInterval$new(1:3, 4:6)
a
#> [1] [1, 4] [2, 5] [3, 6]

as.data.frame(a)
#>   interval
#> 1   [1, 4]
#> 2   [2, 5]
#> 3   [3, 6]

df <- data.frame(id = LETTERS[1:3], interval = a)
df
#>   id interval
#> 1  A   [1, 4]
#> 2  B   [2, 5]
#> 3  C   [3, 6]

df[1,]
#>   id interval
#> 1  A   [1, 4]

df$interval[1]$lower_bound
#> [1] 1

This is of course not production-level code. In particular, you would need to include error handling to ensure that the upper and lower bounds are the same length, and are both numeric.

Upvotes: 5

Related Questions