Reputation: 597
I've discovered some strange behaviour when sub-setting with dplyr tbl_df data frames. When I subset a data-frame with the 'matrix' style df[,'a']
it returns a vector as expected. However when I do the same thing when it's a tbl_df
data frame, it returns a data frame instead.
I've replicated it below using the Iris data set.
Can some-one explain why this is happening, or how I can de-tbl_df that data frames? I need to use dplyr and readr in the build-up to needing this behaviour.
library(dplyr)
data(iris)
str(iris['Sepal.Length'])
'data.frame': 150 obs. of 1 variable:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
str(iris[,'Sepal.Length'])
num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
iris <- tbl_df(iris)
str(iris[,'Sepal.Length'])
Classes ‘tbl_df’ and 'data.frame': 150 obs. of 1 variable:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
Upvotes: 5
Views: 1036
Reputation: 56915
This is on purpose.
See ?tbl_df
:
Methods:
‘tbl_df’
implements two important base methods:
‘[’
Never simplifies (drops), so always returns data.frame
(emphasis added)
If you class(tbl_df(iris))
you will see that its class is "tbl_df", then "tbl", and finally "data.frame", so it might have a different [
method, and methods(class='tbl_df')
indeed shows [.tbl_df
.
(it's a bit like how datatables in the data.table
package have a different [
method too).
edit : to un-tbl_df
, just use data.frame
, e.g. data.frame(tbl_df(iris))
will convert the tbl_df(..)
back to data.frame.
Upvotes: 5