How to repetitively replace substrings in variables in R

Question

I've got the following task

Treatment$V010 <- as.numeric(substr(Treatment$V010,1,2))
Treatment$V020 <- as.numeric(substr(Treatment$V020,1,2))
[...]
Treatment$V1000 <- as.numeric(substr(Treatment$V1000,1,2))

I have 100 variables from $V010, $V020, $V030... to $V1000. Those are numbers of different length. I want to "extract" just the first two digits of the numbers and replace the old number with the new number which is two digits long.

My data frame "Treatment" has 80 more variables which i did not mention here, so it is my goal that this function will just be applied to the 100 variables mentioned.

How can I do that? I could write that command 100 times but I am sure there is a better solution.

Jealie · Accepted Answer

Alright, let's do it. First thing first: as you want to get specific columns of your dataframe, you need to specify their names to access them:

cnames = paste0('V',formatC(seq(10,1000,by=10), width = 3, format = "d", flag = "0"))

(cnames is a vector containing c('V010','V020', ..., 'V1000'))

Next, we will get their indexes:

coli=unlist(sapply(cnames, function (x) which(colnames(Treatment)==x)))

(coli is a vector containing the indexes in Treatment of the relevant columns)

Finally, we will apply your function over these columns:

Treatment[coli] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[coli])

Does it work?

PS: if anyone has a better/more concise way to do it, please tell me :)

EDIT:

The intermediate step is not useful, as you can already use the column names cnames to get the relevant columns, i.e.

Treatment[cnames] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[cnames])

(the only advantage of doing the conversion from column names to column indexes is when there are some missing columns in the dataframe - in this case, Treatment['non existing column'] crashes with undefined columns selected)

How to repetitively replace substrings in variables in R

Answers (2)

Related Questions