Reputation: 1337
I want to populate a column of a data frame with lists. Based on the example code I found here:
d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d$children <- list(list("Mary", "James"), list("Greta", "Sally"))
d
I expected that the following code would work:
d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d["children"] <- list(list("Mary", "James"), list("Greta", "Sally"))
d
but it gave the error:
Warning message:
In `[<-.data.frame`(`*tmp*`, "children", value = list(list("Mary", :
provided 2 variables to replace 1 variables
Based on reading this post and this answer I changed the code to this:
d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
d["children"] <- list(list(list("Mary", "James"), list("Greta", "Sally")))
d
which worked perfectly. The question is, what is going on here? What does the extra call to list
accomplish? Thanks
Upvotes: 2
Views: 7715
Reputation: 11957
There are a couple of things happening here. R produces different behaviors when indexing with single brackets [ ]
or double brackets [[ ]]
. In short, when using single-brackets to index into a data frame, R expects (or returns) list objects. When using double-brackets, the underlying vector is returned.
Note that the first example below, with single-brackets, retains the data frame column's structure and naming, while the double-bracket example returns the column's primitive contents as a vector.
> str(mtcars['mpg'])
'data.frame': 32 obs. of 1 variable:
$ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
> str(mtcars[['mpg']])
num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
To answer your question of why the superfluous call to list()
helps at all, str
can shed some light on the matter:
Your original code, without the extra list()
, is a list of length 2:
> str(list(list("Mary", "James"), list("Greta", "Sally")))
List of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
This fails because d['children']
is expecting to be matched to an object with length 1. However, adding the extra list()
creates an "outer" list of length 1, so the assignment succeeds.
str(list(list(list("Mary", "James"), list("Greta", "Sally"))))
List of 1
$ :List of 2
..$ :List of 2
.. ..$ : chr "Mary"
.. ..$ : chr "James"
..$ :List of 2
.. ..$ : chr "Greta"
.. ..$ : chr "Sally"
Finally, your original code (without the extra list()
) would have worked had you used double-bracket indexing:
d[["children"]] <- list(list("Mary", "James"), list("Greta", "Sally"))
Upvotes: 1
Reputation: 1337
@jdobres answer got me playing with the following examples, which helped me understand (kinda) what's going on.
> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
id name
1 1 Jon
2 2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d$children <- add
> d
id name children
1 1 Jon Mary, James
2 2 Mark Greta, Sally
> str(d$children)
List of 2 # d$children is a list of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
> str(add)
List of 2 # add is a list of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
This works because the lhs and rhs of d$children <- add
are both lists with 2 items.
> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
id name
1 1 Jon
2 2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d["children"] <- add
Warning message:
In `[<-.data.frame`(`*tmp*`, "children", value = list(list("Mary", :
provided 2 variables to replace 1 variables
> d
id name children
1 1 Jon Mary
2 2 Mark James
> str(d["children"])
'data.frame': 2 obs. of 1 variable: # d["children"] is 1 var. with 2 obs.
$ children:List of 2
..$ : chr "Mary"
..$ : chr "James"
> str(add)
List of 2 # add is a list of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
This doesn't work because the lhs of d$children <- add
is "1 var. with 2 obs." but the rhs is "a list of 2".
> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> add <- list(list(list("Mary", "James"), list("Greta", "Sally")))
> d["children"] <- add
> d
id name children
1 1 Jon Mary, James
2 2 Mark Greta, Sally
> str(d["children"])
'data.frame': 2 obs. of 1 variable: # d["children"] is 1 var. with 2 obs.
$ children:List of 2
..$ :List of 2
.. ..$ : chr "Mary"
.. ..$ : chr "James"
..$ :List of 2
.. ..$ : chr "Greta"
.. ..$ : chr "Sally"
> str(add)
List of 1 # add is 1 list with 2 lists
$ :List of 2
..$ :List of 2
.. ..$ : chr "Mary"
.. ..$ : chr "James"
..$ :List of 2
.. ..$ : chr "Greta"
.. ..$ : chr "Sally"
The nomenclature is a little illogical here, but if you accept that a list has to be inside a list to count as a list, then the above works because the lhs of d$children <- add
is "1 var. with 2 obs." and the rhs is "1 list with 2 lists". Note the symmetry 1var:2lists::1list:2lists.
> d <- data.frame(id=1:2, name=c("Jon", "Mark"))
> d
id name
1 1 Jon
2 2 Mark
> add <- list(list("Mary", "James"), list("Greta", "Sally"))
> d[["children"]] <- add
> d
id name children
1 1 Jon Mary, James
2 2 Mark Greta, Sally
> str(d[["children"]])
List of 2 # d[["children"]] is a list of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
> str(add)
List of 2 # add is a list of 2
$ :List of 2
..$ : chr "Mary"
..$ : chr "James"
$ :List of 2
..$ : chr "Greta"
..$ : chr "Sally"
Like the first example, this works because the lhs and rhs of d$children <- add
are both lists with 2 items.
I'm still not sure what the enclosing structure of add
should be called in the cases where str(add)
evaluates to List of 2...
, but that might not be important.
Upvotes: 0