Extracting columns from Data Frame based on a "formula"

Question

I have some data which looks like:

  data(iris)

  iris %>%
    select(Species, everything()) %>%
    rename(Y = 1) %>%
    rename_at(vars(-c(1)), ~str_c("X", seq_along(.)))

Data:

       Y  X1  X2  X3  X4
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3.0 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
4 setosa 4.6 3.1 1.5 0.2
5 setosa 5.0 3.6 1.4 0.2
6 setosa 5.4 3.9 1.7 0.4

I add a random variable:

  d$noise <- rnorm(length(d))

I am trying to extract just the Y, X1, X2... XN variables (dynamically). What I currently have is:

d %>%
  select("Y", cat(paste0("X", seq_along(2:ncol(.)), collapse = ", ")))

This doesn't work since it takes into account the noise column and doesn't work even without the noise column.

So I am trying to create a new data frame which just extracts the Y, X1, X2...XN columns.

Brendan A. · Accepted Answer

dplyr provides two select helper functions that you could use --- contains for literal strings or matches for regular expressions.

In this case you could do

d %>%
  select("Y", contains("X"))

or

d %>%
  select("Y", matches("X\d+"))

The first one works in the example you provided but would fail if you have other variables that contain any "X" character. The second is more robust in that it will only capture variables whose names are "X" followed by one or more digits.

Extracting columns from Data Frame based on a "formula"

Answers (2)

Related Questions

Extracting columns from Data Frame based on a &quot;formula&quot;

Answers (2)

Related Questions

Extracting columns from Data Frame based on a "formula"