Josh
Josh

Reputation: 1337

Populating a data frame with lists in R

I want to populate a column of a data frame with lists. Based on the example code I found here:

d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d$children <-  list(list("Mary", "James"), list("Greta", "Sally"))
d

I expected that the following code would work:

d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d["children"] <-  list(list("Mary", "James"), list("Greta", "Sally"))
d

but it gave the error:

Warning message:
In `[<-.data.frame`(`*tmp*`, "children", value = list(list("Mary",  :
  provided 2 variables to replace 1 variables

Based on reading this post and this answer I changed the code to this:

d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d["children"] <-  list(list(list("Mary", "James"), list("Greta", "Sally")))
d

which worked perfectly. The question is, what is going on here? What does the extra call to list accomplish? Thanks

Upvotes: 2

Views: 7715

Answers (2)

jdobres
jdobres

Reputation: 11957

There are a couple of things happening here. R produces different behaviors when indexing with single brackets [ ] or double brackets [[ ]]. In short, when using single-brackets to index into a data frame, R expects (or returns) list objects. When using double-brackets, the underlying vector is returned.

Note that the first example below, with single-brackets, retains the data frame column's structure and naming, while the double-bracket example returns the column's primitive contents as a vector.

> str(mtcars['mpg'])
'data.frame':   32 obs. of  1 variable:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

> str(mtcars[['mpg']])
 num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

To answer your question of why the superfluous call to list() helps at all, str can shed some light on the matter:

Your original code, without the extra list(), is a list of length 2:

> str(list(list("Mary", "James"), list("Greta", "Sally")))

List of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"

This fails because d['children'] is expecting to be matched to an object with length 1. However, adding the extra list() creates an "outer" list of length 1, so the assignment succeeds.

str(list(list(list("Mary", "James"), list("Greta", "Sally"))))

List of 1
 $ :List of 2
  ..$ :List of 2
  .. ..$ : chr "Mary"
  .. ..$ : chr "James"
  ..$ :List of 2
  .. ..$ : chr "Greta"
  .. ..$ : chr "Sally"

Finally, your original code (without the extra list()) would have worked had you used double-bracket indexing:

d[["children"]] <-  list(list("Mary", "James"), list("Greta", "Sally"))

Upvotes: 1

Josh
Josh

Reputation: 1337

@jdobres answer got me playing with the following examples, which helped me understand (kinda) what's going on.

> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
  id name
1  1  Jon
2  2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d$children <- add
> d
  id name     children
1  1  Jon  Mary, James
2  2 Mark Greta, Sally
> str(d$children)
List of 2                                  # d$children is a list of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"
> str(add)
List of 2                                  # add is a list of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"

This works because the lhs and rhs of d$children <- add are both lists with 2 items.

> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
  id name
1  1  Jon
2  2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d["children"] <- add
Warning message:
In `[<-.data.frame`(`*tmp*`, "children", value = list(list("Mary",  :
  provided 2 variables to replace 1 variables
> d
  id name children
1  1  Jon     Mary
2  2 Mark    James
> str(d["children"])
'data.frame':   2 obs. of  1 variable:     # d["children"] is 1 var. with 2 obs.
 $ children:List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
> str(add)
List of 2                                  # add is a list of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"

This doesn't work because the lhs of d$children <- add is "1 var. with 2 obs." but the rhs is "a list of 2".

> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> add <- list(list(list("Mary", "James"), list("Greta", "Sally")))
> d["children"] <- add
> d
  id name     children
1  1  Jon  Mary, James
2  2 Mark Greta, Sally
> str(d["children"])
'data.frame':   2 obs. of  1 variable:     # d["children"] is 1 var. with 2 obs.
 $ children:List of 2
  ..$ :List of 2
  .. ..$ : chr "Mary"
  .. ..$ : chr "James"
  ..$ :List of 2
  .. ..$ : chr "Greta"
  .. ..$ : chr "Sally"
> str(add)
List of 1                                  # add is 1 list with 2 lists
 $ :List of 2
  ..$ :List of 2
  .. ..$ : chr "Mary"
  .. ..$ : chr "James"
  ..$ :List of 2
  .. ..$ : chr "Greta"
  .. ..$ : chr "Sally"

The nomenclature is a little illogical here, but if you accept that a list has to be inside a list to count as a list, then the above works because the lhs of d$children <- add is "1 var. with 2 obs." and the rhs is "1 list with 2 lists". Note the symmetry 1var:2lists::1list:2lists.

> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
  id name
1  1  Jon
2  2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d[["children"]] <- add
> d
  id name     children
1  1  Jon  Mary, James
2  2 Mark Greta, Sally
> str(d[["children"]])
List of 2                                  # d[["children"]] is a list of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"
> str(add)
List of 2                                  # add is a list of 2
 $ :List of 2
  ..$ : chr "Mary"
  ..$ : chr "James"
 $ :List of 2
  ..$ : chr "Greta"
  ..$ : chr "Sally"

Like the first example, this works because the lhs and rhs of d$children <- add are both lists with 2 items.

I'm still not sure what the enclosing structure of add should be called in the cases where str(add) evaluates to List of 2..., but that might not be important.

Upvotes: 0

Related Questions