Michela
Michela

Reputation: 33

How to sum the values of different columns in a dataframe looping on the variables names

I'm relatively new to R (used to work in Stata before) so sorry if the question is too trivial.

I've a dataframe with variables named in a sequential way that follows the following logic: q12.X.Y where X assumes the values from 1 to 9, and Y from 1 to 5

I need to add together the values of the variables of all the q12.X.Y variables with the Y numbers from 1 to 3 (but NOT those ending with the number 4 or 5)

Ideally I would have written a loop based on the sequential numbers of the variables, namely something like:

df$test <- 0
for(i in 1:9){
     for(j in 1:3){
       df$test <- df$test+ df$q12.i.j
      }
 }

That obviously do not work.

I also tried with the command "rowSums" and "subset"

df$test <- rowSums(subset(df,select= ...)

However I find it a bit cumbersome, as the column numbers are not sequential and i do not want to type the name of all the variables.

Any suggestion how to do that?

Upvotes: 1

Views: 53

Answers (1)

akrun
akrun

Reputation: 887501

We can use grep to get the match

rowSums(df[grep("q12\\.[1-9]\\.[1-3]", names(df))])

or if all the column names are present, then use an exact match by creating the column names with paste

rowSums(df[paste0(rep(paste0("q12.", 1:9, "."), 3), 1:3)])

Upvotes: 1

Related Questions