David
David

Reputation: 10152

Custom S4 Class in data.frame/tibble/data.table

I am looking to use an S4 class, defined by me, inside of data.frames (and/or tibbles, data.tables, ...), similar to what lubridate::period() offers.

E.g., the following code creates a vector of periods, stores them in a data.frame (tibble in this case), extracts one value and accesses the class again...


tibble::tibble(
  x = c(lubridate::period(120), 
        lubridate::period(2))
)
#> # A tibble: 2 x 1
#>   x       
#>   <Period>
#> 1 120S    
#> 2 2S 

str(
  tibble::tibble(
    x = c(lubridate::period(120), 
          lubridate::period(2))
  )$x[1]
)
#> Formal class 'Period' [package "lubridate"] with 6 slots
#>   ..@ .Data : num 120
#>   ..@ year  : num 0
#>   ..@ month : num 0
#>   ..@ day   : num 0
#>   ..@ hour  : num 0
#>   ..@ minute: num 0

I would like to replicate this with the typical Person class.

A first (failed) try looks like this:

.person <- setClass("Person", 
                   slots = list(name = "character",
                                age = "numeric"))
person <- function(name, age) {
  .person(name = name, age = age)
}

format.Person <- function(x, ...) {
  paste0("<Person: ", x@name, " ", x@age, ">")
}


# create some instances
person("Alice", 123)
#> An object of class "Person"
#> Slot "name":
#>   [1] "Alice"
#> 
#> Slot "age":
#>   [1] 123

## Fair enough, no proper "show" method implemented yet...

format(person("Alice", 123))
#> [1] "<Person: Alice 123>"


ppl <- c(person("Alice", 123),
  person("Bob", 42))
ppl
#> [[1]]
#> An object of class "Person"
#> Slot "name":
#>   [1] "Alice"
#> 
#> Slot "age":
#>   [1] 123
#> 
#> 
#> [[2]]
#> An object of class "Person"
#> Slot "name":
#>   [1] "Bob"
#> 
#> Slot "age":
#>   [1] 42

## Now converting to data.frames

data.frame(x = ppl)
#> Error in unique.default(x, nmax = nmax) : 
#>   unique() applies only to vectors

tibble::tibble(x = ppl)
#> # A tibble: 2 x 1
#> x       
#> <list>  
#>   1 <Person>
#>   2 <Person>

data.table::data.table(x = ppl)
#> x
#> 1: <Person[2]>
#> 2: <Person[2]>

What I want in the end is to have something along the lines of this:

tibble::tibble(x = ppl)
#> # A tibble: 2 x 1
#>   x     
#>   <Person>
#> 1 <Alice, 123>    
#> 2 <Bob, 42>
#>

Upvotes: 2

Views: 263

Answers (2)

David
David

Reputation: 10152

Thanks to the hint of @JDL regarding the atomic types and looking back at the lubridate::period definition, I found this solution.

The key is (besides contains = "character") to have a slot named .Data

.person <- setClass("Person", 
                    contains = "character",
                    slots = list(.Data = "character",
                                 name = "character",
                                 age = "numeric"))
person <- function(name, age) {
  .person(.Data = name, name = name, age = age)
}
setMethod("length", "Person", function(x) {
  length([email protected])
})
setMethod("show", "Person", function(object) {
  print(format(object))
})
setMethod("c", "Person", function(x, ...) {
  dots <- list(...)
  nempty <- sapply(dots, length) != 0
  elements <- dots[nempty]#lapply(dots[nempty], as.Person)

  .Data <- c([email protected], unlist(lapply(elements, slot, ".Data")))
  name <- c(x@name, unlist(lapply(elements, slot, "name")))
  age <- c(x@age, unlist(lapply(elements, slot, "age")))
  new("Person", .Data = .Data, name = name, age = age)
})
# apparently not strictly needed here
# setMethod("cbind2", "Person", function(x, y, ...) {
#   c(x, y, ...)
# })
setMethod("[", "Person", function(x, i, j, ..., drop = TRUE) {
  new("Person", .Data = [email protected][i], name = x@name[i], age = x@age[i])
})

format.Person <- function(x, ...) {
  if (length(x) == 0) {
    return("<PERSON>")
  }
  
  paste0("<Person: ", x@name, " ", x@age, ">")
}

x <- person("Alice", 123)
x
#> [1] "<Person: Alice 123>"

format(person("Alice", 123))
#> [1] "<Person: Alice 123>"
c(x, x)
#> [1] "<Person: Alice 123>" "<Person: Alice 123>"

ppl <- c(person("Alice", 123),
         person("Bob", 42))
ppl
#> [1] "<Person: Alice 123>" "<Person: Bob 42>"
str(ppl)
#> Formal class 'Person' [package ".GlobalEnv"] with 3 slots
#>   ..@ .Data: chr [1:2] "Alice" "Bob"
#>   ..@ name : chr [1:2] "Alice" "Bob"
#>   ..@ age  : num [1:2] 123 42

data.frame(x = ppl)
#>       x
#> 1 Alice
#> 2   Bob
a <- tibble::tibble(x = ppl)
a
#> # A tibble: 2 x 1
#>   x       
#>   <Person>
#> 1 Alice   
#> 2 Bob    

str(a$x[1])
#> Formal class 'Person' [package ".GlobalEnv"] with 3 slots
#>   ..@ .Data: chr "Alice"
#>   ..@ name : chr "Alice"
#>   ..@ age  : num 123
data.table::data.table(x = ppl)
#>                      x
#> 1: <Person: Alice 123>
#> 2:    <Person: Bob 42>

Created on 2020-12-01 by the reprex package (v0.3.0)

Upvotes: 0

JDL
JDL

Reputation: 1654

The reason this hasn't worked is because data.frames (and similar objects like tibbles) can only contain atomic types in their columns (i.e. something that could be created from a call to vector). The atomic types are "logical", "integer", "double", "complex", "character", "raw" and "list".

In practice, that means your class will have to contain one of these atomic types (probably "list"), along the lines of setClass("onePerson",contains="list"). Having each slot be an atomic type is not enough on its own. If you want to enforce that the list has the correct components of the correct classes, then you can use validObject methods to enforce that.

It might be possible to stick with the class definition that you have that does not extend an atomic type, but you will need to write methods for functions like cbind2 and [ to allow it to interact with data frames.

Lastly, you could make your person class extend data.frame or tibble itself, with restrictions on the column names and types. This is essentially the same as the list solution, since a data frame is a list.

Upvotes: 2

Related Questions