user113156
user113156

Reputation: 7107

Extracting columns from Data Frame based on a "formula"

I have some data which looks like:

  data(iris)

  iris %>%
    select(Species, everything()) %>%
    rename(Y = 1) %>%
    rename_at(vars(-c(1)), ~str_c("X", seq_along(.)))

Data:

       Y  X1  X2  X3  X4
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3.0 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
4 setosa 4.6 3.1 1.5 0.2
5 setosa 5.0 3.6 1.4 0.2
6 setosa 5.4 3.9 1.7 0.4

I add a random variable:

  d$noise <- rnorm(length(d))

I am trying to extract just the Y, X1, X2... XN variables (dynamically). What I currently have is:

d %>%
  select("Y", cat(paste0("X", seq_along(2:ncol(.)), collapse = ", ")))

This doesn't work since it takes into account the noise column and doesn't work even without the noise column.

So I am trying to create a new data frame which just extracts the Y, X1, X2...XN columns.

Upvotes: 1

Views: 45

Answers (2)

akrun
akrun

Reputation: 886948

we can also use

d %>%
  select(Y, starts_with('X'))

Upvotes: 1

Brendan A.
Brendan A.

Reputation: 1268

dplyr provides two select helper functions that you could use --- contains for literal strings or matches for regular expressions.

In this case you could do

d %>%
  select("Y", contains("X"))

or

d %>%
  select("Y", matches("X\\d+"))

The first one works in the example you provided but would fail if you have other variables that contain any "X" character. The second is more robust in that it will only capture variables whose names are "X" followed by one or more digits.

Upvotes: 3

Related Questions