Reputation: 919
I've got the following task
Treatment$V010 <- as.numeric(substr(Treatment$V010,1,2))
Treatment$V020 <- as.numeric(substr(Treatment$V020,1,2))
[...]
Treatment$V1000 <- as.numeric(substr(Treatment$V1000,1,2))
I have 100 variables from $V010, $V020, $V030... to $V1000. Those are numbers of different length. I want to "extract" just the first two digits of the numbers and replace the old number with the new number which is two digits long.
My data frame "Treatment" has 80 more variables which i did not mention here, so it is my goal that this function will just be applied to the 100 variables mentioned.
How can I do that? I could write that command 100 times but I am sure there is a better solution.
Upvotes: 0
Views: 620
Reputation: 67828
A solution where relevant columns are selected based on a pattern that can be described with a regular expression.
Regex explanation:
^
: Start of string
V
: Literal V
\\d{2}
: Exactly 2 digits
Treatment <- data.frame(V010 = c(120, 130), x010 = c(120, 130), xV1000 = c(111, 222), V1000 = c(111, 222))
Treatment
# V010 x010 xV1000 V1000
# 1 120 120 111 111
# 2 130 130 222 222
# columns with a name that matches the pattern (logical vector)
idx <- grepl(x = names(Treatment), pattern = "^V\\d{2}")
# substr the relevant columns
Treatment[ , idx] <- sapply(Treatment[ , idx], FUN = function(x){
as.numeric(substr(x, 1, 2))
})
Treatment
# V010 x010 xV1000 V1000
# 1 12 120 111 11
# 2 13 130 222 22
Upvotes: 1
Reputation: 6277
Alright, let's do it. First thing first: as you want to get specific columns of your dataframe, you need to specify their names to access them:
cnames = paste0('V',formatC(seq(10,1000,by=10), width = 3, format = "d", flag = "0"))
(cnames
is a vector containing c('V010','V020', ..., 'V1000')
)
Next, we will get their indexes:
coli=unlist(sapply(cnames, function (x) which(colnames(Treatment)==x)))
(coli
is a vector containing the indexes in Treatment
of the relevant columns)
Finally, we will apply your function over these columns:
Treatment[coli] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[coli])
Does it work?
PS: if anyone has a better/more concise way to do it, please tell me :)
EDIT:
The intermediate step is not useful, as you can already use the column names cnames
to get the relevant columns, i.e.
Treatment[cnames] = mapply(function (x) as.numeric(substr(x, 1, 2)), Treatment[cnames])
(the only advantage of doing the conversion from column names to column indexes is when there are some missing columns in the dataframe - in this case, Treatment['non existing column']
crashes with undefined columns selected
)
Upvotes: 3