robertspierre
robertspierre

Reputation: 4457

Selecting a single column from a tibble still returns a tibble instead of a vector

I have a tibble called df:

> class(df)
[1] "tbl_df"     "tbl"        "data.frame"

Now I run:

> sen <- df[df$my_dummy==0, "col_name"]

This returns another tibble:

> class(sen)
[1] "tbl_df"     "tbl"        "data.frame"

Why?

How can I get a "numeric" out of it?

Upvotes: 2

Views: 1650

Answers (4)

GuedesBF
GuedesBF

Reputation: 9878

Tibbles are more restrictive (for type safety), as you can see in the vignette suggested by @user13032723.

For a comparison of the behaviour of tibbles and data.frames, see the outputs you get from the same subsetting operations over the two classes:

# A tibble: 9 x 2
  expressions                                     output                 
  <chr>                                           <chr>                  
1 iris['Sepal.Length'] %>% class                  data.frame             
2 iris[,'Sepal.Length'] %>% class                 numeric  ###              
3 iris[['Sepal.Length']] %>% class                numeric                
4 iris$Sepal.Length %>% class                     numeric                
5 iris %>% pull('Sepal.Length') %>% class         numeric                
6 tibble(iris)['Sepal.Length'] %>% class          tbl_df, tbl, data.frame
7 tibble(iris)[,'Sepal.Length'] %>% class         tbl_df, tbl, data.frame ###
8 tibble(iris)[['Sepal.Length']] %>% class        numeric                
9 tibble(iris) %>% pull('Sepal.Length') %>% class numeric  

As you can see, tibbles handle subsetting differently. They do not simplify the object class by default with base subsetting ([x,y]). See the difference between rows 2 and 7of the table (###).

To consistently extract vectors out of dataframes, you can use the $extractor, the double brackets [[]], or pull().

Code

library(purrr)
library(dplyr)

tibble(expressions=c("iris['Sepal.Length'] %>% class",
                     "iris[,'Sepal.Length'] %>% class",
                     "iris[['Sepal.Length']] %>% class",
                     "iris$Sepal.Length %>% class",
                     "iris %>% pull('Sepal.Length') %>% class",
                     "tibble(iris)['Sepal.Length'] %>% class",
                     "tibble(iris)[,'Sepal.Length'] %>% class",
                     "tibble(iris)[['Sepal.Length']] %>% class",
                     "tibble(iris) %>% pull('Sepal.Length') %>% class"))%>%
        mutate(output=map_chr(expressions, ~eval(parse(text=.x))%>%
                                  toString))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389325

You can try $ to extract the col_name as vector. Try -

sen <- df$col_name[df$my_dummy == 0]

Upvotes: 1

gss
gss

Reputation: 1453

And answering: why?

This is a conscious decision. You can read about this in the vignette: Tibbles.

Being "strict" is mentioned there. I you have a data.frame, by default argument drop is set to TRUE, but it doesn't mean that in any situation data.frame will be dropped and you will get vector. See this example:

df <- data.frame(a = 1, b = 2, c = 3)

var <- "a"

df[, var] # numeric vector

var <- c("a", "b")

df[, var] # data.frame

var can change and thus behavior of data.frame can change. This change of behavior (which depends on something you can't predict) is a reason.

Upvotes: 0

Kra.P
Kra.P

Reputation: 15153

try pull

sen <- df %>%
  filter(my_dummy == 0) %>%
  pull(col_name)

Upvotes: 4

Related Questions