Reputation: 24675

dplyr filter by the first column

Is it possible to filter in dplyr by the position of a column?

I know how to do it without dplyr

iris[iris[,1]>6,]

But how can I do it in dplyr?

Thanks!

Upvotes: 7

Answers (4)

BMLopes

Reputation: 606

Not very elegant, but you can rename the variable, and use the new name on a dplyr pipe.

iris_copy <- iris
original_names <- names(iris_copy)

Renaming the first variable:

names(iris_copy)[1] <- "col1"

Filtering the first variable:

iris_copy |> filter(col1 > 6)

If you need the original variable name:

names(iris_copy) <- original_names

Upvotes: 0

LMc

Reputation: 18642

dply >= 1.0.0

Scoped verbs (_if, _at, _all) and by extension all_vars() and any_vars() have been superseded by across(). In the case of filter the functions if_any and if_all have been created to combine logic across multiple columns to aid in subsetting (these verbs are available in dplyr >= 1.0.4):

if_any() and if_all() are used with to apply the same predicate function to a selection of columns and combine the results into a single logical vector.

The first argument to across, if_any, and if_any is still tidy-select syntax for column selection, which includes selection by column position.

Single Column

In your single column case you could do any with the same result:

iris %>% 
  filter(across(1, ~ . > 6))

iris %>% 
  filter(if_any(1, ~ . > 6))

iris %>% 
  filter(if_all(1, ~ . > 6))

Multiple Columns

If you're apply a predicate function or formula across multiple columns then across might give unexpected results and in this case you should use if_any and if_all:

iris %>% 
  filter(if_all(c(2, 4), ~ . > 2.3)) # by column position

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1          6.3         3.3          6.0         2.5 virginica
2          7.2         3.6          6.1         2.5 virginica
3          5.8         2.8          5.1         2.4 virginica
4          6.3         3.4          5.6         2.4 virginica
5          6.7         3.1          5.6         2.4 virginica
6          6.7         3.3          5.7         2.5 virginica

Notice this returns rows where all selected columns have a value greater than 2.3, which is a subset of rows where any of the selected columns meet the logic:

iris %>% 
  filter(if_any(ends_with("Width"), ~ . > 2.3)) # same columns selection as above

Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1           5.1         3.5          1.4         0.2    setosa
2           4.9         3.0          1.4         0.2    setosa
3           4.7         3.2          1.3         0.2    setosa
4           4.6         3.1          1.5         0.2    setosa
5           5.0         3.6          1.4         0.2    setosa
6           6.7         3.3          5.7         2.5 virginica
7           6.7         3.0          5.2         2.3 virginica
8           6.3         2.5          5.0         1.9 virginica
9           6.5         3.0          5.2         2.0 virginica
10          6.2         3.4          5.4         2.3 virginica
11          5.9         3.0          5.1         1.8 virginica

The output above was shorted to be more compact for this example.

Upvotes: 7

Scransom

Reputation: 3335

No magic, just use the item column number as per above, rather than the variable (column) name:

library("dplyr")

iris %>%
  filter(iris[,1] > 6)

Which as @eipi10 commented is better as

iris %>%
  filter(.[[1]] > 6)

Upvotes: 9

akuiper

Reputation: 214967

Besides the suggestion by @thelatemail, you can also use filter_at and pass the column number to vars parameter:

iris %>% filter_at(1, all_vars(. > 6))

all(iris %>% filter_at(1, all_vars(. > 6)) == iris[iris[,1] > 6, ])
# [1] TRUE

Upvotes: 15

dplyr filter by the first column

Answers (4)

dply >= 1.0.0

Related Questions