Tom McMahon
Tom McMahon

Reputation: 597

Why does subsetting change with tbl_df in dlpyr?

I've discovered some strange behaviour when sub-setting with dplyr tbl_df data frames. When I subset a data-frame with the 'matrix' style df[,'a'] it returns a vector as expected. However when I do the same thing when it's a tbl_df data frame, it returns a data frame instead.

I've replicated it below using the Iris data set.

Can some-one explain why this is happening, or how I can de-tbl_df that data frames? I need to use dplyr and readr in the build-up to needing this behaviour.

library(dplyr)
data(iris)

str(iris['Sepal.Length'])
'data.frame':   150 obs. of  1 variable:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

str(iris[,'Sepal.Length'])
 num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

iris <- tbl_df(iris)

str(iris[,'Sepal.Length'])
Classes ‘tbl_df’ and 'data.frame':  150 obs. of  1 variable:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

Upvotes: 5

Views: 1036

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56915

This is on purpose.

See ?tbl_df:

Methods:

‘tbl_df’ implements two important base methods:

print Only prints the first 10 rows, and the columns that fit on screen

‘[’ Never simplifies (drops), so always returns data.frame

(emphasis added)

If you class(tbl_df(iris)) you will see that its class is "tbl_df", then "tbl", and finally "data.frame", so it might have a different [ method, and methods(class='tbl_df') indeed shows [.tbl_df.

(it's a bit like how datatables in the data.table package have a different [ method too).


edit : to un-tbl_df , just use data.frame, e.g. data.frame(tbl_df(iris)) will convert the tbl_df(..) back to data.frame.

Upvotes: 5

Related Questions