How to select a series of variables efficiently in R?

Question

I have a series of variables with the variable names "HPV_x_ALL". The only difference between these names is the x, which is number (e.g., 11, 16, 18, 33). I'd like to use -rowSums- to summarize the values of HPV_x_ALL for each observation, and I tried using * to represent the numbers, but it doesn't work. Thank you!

Update: Hi, I added a reproducible dataset.

structure(list(HPV_16_ALL = c(1L, NA, 0L, 0L, 0L, 0L), HPV_18_ALL = c(0L, 
NA, 0L, 0L, 0L, 0L), HPV_33_ALL = c(0L, NA, 0L, 0L, 0L, 0L)), row.names = 40:45, class = "data.frame")

dc37 · Accepted Answer

Without a reproducible example, it is difficult to be sure that this answer will be appropriate.

However, starting from this dummy example:

set.seed(123)
df <- data.frame(Var = c(paste0("HPV_",11:15,"_ALL"),paste0("BPV_",11:15,"_ALL")),
                 Val = sample(1:100,10))

          Var Val
1  HPV_11_ALL  31
2  HPV_12_ALL  79
3  HPV_13_ALL  51
4  HPV_14_ALL  14
5  HPV_15_ALL  67
6  BPV_11_ALL  42
7  BPV_12_ALL  50
8  BPV_13_ALL  43
9  BPV_14_ALL  97
10 BPV_15_ALL  25

You can get the rows corresponding to "HPV_xx_ALL" by doing:

grep("HPV_\d{2}_ALL",df$Var, perl = TRUE)

[1] 1 2 3 4 5

So, you can get the sum of rows corresponding to the pattern you are looking for by doing:

sum(df[grep("HPV_\d{2}_ALL",df$Var, perl = TRUE),"Val"])

[1] 242

If your pattern HPV_xx_ALL are columns names, you can do the same by doing:

rowSums(df[,grep("HPV_\d{2}_ALL", names(df), perl = TRUE)]

Does it answer your question ? If not, please provide a reproducible example of your dataset (see: How to make a great R reproducible example)

How to select a series of variables efficiently in R?

Answers (1)

Related Questions