RitaM
RitaM

Reputation: 143

dplyr: Filter based on a vector

I assume the error is in the code and therefore I think this example is enough.

I wanted to filter my df (df2) according to the vector i created. This vector was created extracting information from a column of another data frame (df1).

Vector based on df1: (extracting the 3rd column of df1)

 vector_df1 <- df1 [, 3]

Trying to apply filter on df2, based on the vector_df1

Filter_df2 <- df2 %>%
                    
               filter(Column_df2 %in% vector_df1)

Results: 0 rows

Can someone help me see what I'm doing wrong

thanks in advance

Upvotes: 1

Views: 1446

Answers (1)

akrun
akrun

Reputation: 887951

This is a case of structure of dataset i.e. with data.frame, if we use [,col], it uses drop = TRUE and coerces it to vector, while for data.table or tibble, by default, it is drop = FALSE, thus returning the tibble itself with single column. The documentation can be found in ?Extract. Safe option is [[ which have the same behavior in extraction of column as a vector

vector_df1 <- df[[3]]

According to ?Extract, the default usage is

x[i, j, ... , drop = TRUE]

and it is specified as

or matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples). This only works for extracting elements, not for the replacement. See drop for further details.

The documentation for tibble can be found in ?"tbl_df-class"

df[, j] returns a tibble; it does not automatically extract the column inside. df[, j, drop = FALSE] is the default. Read more in subsetting.

Upvotes: 2

Related Questions