Prasanna S
Prasanna S

Reputation: 363

Select multiple columns conditionally in R

I have a data set named db0 that looks like this:

db0 <- data.frame(
  qty_day1_41 = c(7,    5,  2,  7,  8,  2,  6,  3,  3),
  qty_day1_42 = c(3,    8,  6,  9,  1,  9,  3,  7,  8),
  qty_day1_141 = c(4,   2,  2,  8,  7,  3,  9,  5,  8),
  qty_day1_142 = c(2,   10, 10, 5,  8,  5,  7,  1,  8),
  qty_day2_41 = c(7,    5,  2,  7,  8,  2,  6,  3,  3),
  qty_day2_42 = c(3,    8,  6,  9,  1,  9,  3,  7,  8),
  qty_day2_141 = c(4,   2,  2,  8,  7,  3,  9,  5,  8),
  qty_day2_142 = c(2,   10, 10, 5,  8,  5,  7,  1,  8)
  )

db0

  qty_day1_41 qty_day1_42 qty_day1_141 qty_day1_142 qty_day2_41 qty_day2_42
1           7           3            4            2           7           3
2           5           8            2           10           5           8
3           2           6            2           10           2           6
4           7           9            8            5           7           9
5           8           1            7            8           8           1
6           2           9            3            5           2           9
7           6           3            9            7           6           3
8           3           7            5            1           3           7
9           3           8            8            8           3           8
  qty_day2_141 qty_day2_142
1            4            2
2            2           10
3            2           10
4            8            5
5            7            8
6            3            5
7            9            7
8            5            1
9            8            8

The columns represent the quantity (qty) consumed in day n (n=1,2...14) of item k (k=41, 42,... 180). I want to select only items 41 and 42 and NOT 141 and 142. I tried this code:

library(tidyverse) 
db0 %>% select(matches("41|42"))

However, this selects items 141 and 142 also.

  qty_day1_41 qty_day1_42 qty_day1_141 qty_day1_142 qty_day2_41 qty_day2_42
1           7           3            4            2           7           3
2           5           8            2           10           5           8
3           2           6            2           10           2           6
4           7           9            8            5           7           9
5           8           1            7            8           8           1
6           2           9            3            5           2           9
7           6           3            9            7           6           3
8           3           7            5            1           3           7
9           3           8            8            8           3           8
  qty_day2_141 qty_day2_142
1            4            2
2            2           10
3            2           10
4            8            5
5            7            8
6            3            5
7            9            7
8            5            1
9            8            8

How can I select the columns with names containing the numbers 41 and 42 only (either using dplyr or otherwise)?

Upvotes: 1

Views: 105

Answers (1)

akrun
akrun

Reputation: 887118

We can specify the pattern with _ before the digits and $ to specify the end of the string

library(dplyr)
db0 %>% 
    select(matches("_(41|42)$"))

Upvotes: 2

Related Questions