Reputation: 24675
Is it possible to filter
in dplyr
by the position of a column?
I know how to do it without dplyr
iris[iris[,1]>6,]
But how can I do it in dplyr?
Thanks!
Upvotes: 7
Views: 17142
Reputation: 606
Not very elegant, but you can rename the variable, and use the new name on a dplyr pipe.
iris_copy <- iris
original_names <- names(iris_copy)
Renaming the first variable:
names(iris_copy)[1] <- "col1"
Filtering the first variable:
iris_copy |> filter(col1 > 6)
If you need the original variable name:
names(iris_copy) <- original_names
Upvotes: 0
Reputation: 18642
Scoped verbs (_if
, _at
, _all
) and by extension all_vars()
and any_vars()
have been superseded by across()
. In the case of filter
the functions if_any
and if_all
have been created to combine logic across multiple columns to aid in subsetting (these verbs are available in dplyr >= 1.0.4):
if_any() and if_all() are used with to apply the same predicate function to a selection of columns and combine the results into a single logical vector.
The first argument to across
, if_any
, and if_any
is still tidy-select syntax for column selection, which includes selection by column position.
Single Column
In your single column case you could do any with the same result:
iris %>%
filter(across(1, ~ . > 6))
iris %>%
filter(if_any(1, ~ . > 6))
iris %>%
filter(if_all(1, ~ . > 6))
Multiple Columns
If you're apply a predicate function or formula across multiple columns then across
might give unexpected results and in this case you should use if_any
and if_all
:
iris %>%
filter(if_all(c(2, 4), ~ . > 2.3)) # by column position
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 6.3 3.3 6.0 2.5 virginica
2 7.2 3.6 6.1 2.5 virginica
3 5.8 2.8 5.1 2.4 virginica
4 6.3 3.4 5.6 2.4 virginica
5 6.7 3.1 5.6 2.4 virginica
6 6.7 3.3 5.7 2.5 virginica
Notice this returns rows where all selected columns have a value greater than 2.3, which is a subset of rows where any of the selected columns meet the logic:
iris %>%
filter(if_any(ends_with("Width"), ~ . > 2.3)) # same columns selection as above
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 6.7 3.3 5.7 2.5 virginica
7 6.7 3.0 5.2 2.3 virginica
8 6.3 2.5 5.0 1.9 virginica
9 6.5 3.0 5.2 2.0 virginica
10 6.2 3.4 5.4 2.3 virginica
11 5.9 3.0 5.1 1.8 virginica
The output above was shorted to be more compact for this example.
Upvotes: 7
Reputation: 3335
No magic, just use the item column number as per above, rather than the variable (column) name:
library("dplyr")
iris %>%
filter(iris[,1] > 6)
Which as @eipi10 commented is better as
iris %>%
filter(.[[1]] > 6)
Upvotes: 9
Reputation: 214967
Besides the suggestion by @thelatemail, you can also use filter_at
and pass the column number to vars
parameter:
iris %>% filter_at(1, all_vars(. > 6))
all(iris %>% filter_at(1, all_vars(. > 6)) == iris[iris[,1] > 6, ])
# [1] TRUE
Upvotes: 15