Reputation: 2666
I have a set a list of factors that looks like this:
v.list <- c('AM','EM','SMH')
I would like to calculate new columns in a data set based on previous columns in the data set, that all are unqiuely defined by one of these identifiers. Here are some example data to illustrate:
height.pre.AM <- rnorm(10)
height.pre.EM <- rnorm(10)
height.pre.SMH <- rnorm(10)
height.post.AM <- rnorm(10)
height.post.EM <- rnorm(10)
height.post.SMH <- rnorm(10)
d<- data.table(height.pre.AM,height.pre.EM,height.pre.SMH,height.post.AM,height.post.EM,height.post.SMH)
I would then like to calculate 3 new vectors, the change in height between pre and post vectors, by each identifier in the list. I can do this with 3 lines that look like this:
d[,delta_EM := height.post.EM - height.pre.EM ]
d[,delta_AM := height.post.AM - height.pre.AM ]
d[,delta_SMH := height.post.SMH - height.pre.SMH]
How can I do this with a single line, and a vector stored as v.list
is above?
I tried a for loop constructed as:
for(i in 1: length(v.list)){
v <- (noquote(paste(v.list[i])))
pre <- paste("d[,delta_",v,":= height.post.",v," - height.pre.",v,"]",sep="")
cat(noquote(pre), sep="\n")
}
However, this just prints the lines, rather than executing them.
Upvotes: 1
Views: 110
Reputation: 49448
Here you go:
for (v in v.list)
d[, paste0('delta_', v) := get(paste0('height.post.', v)) -
get(paste0('height.pre.', v))]
Upvotes: 3
Reputation: 19
Here's one way:
library(dplyr)
calculate_delta <-function(df, id){
calc_string <- paste('height.post.', id, ' - height.pre.', id, sep = '')
vector <- mutate_(df, 'delta' = calc_string)$delta
}
vector_list <- setNames(lapply(v.list, function(x) calculate_delta(d, x)), v.list)
You could cram it into one line if you wanted, but that wouldn't be very readable.
Upvotes: 1
Reputation: 99361
There is probably a better way, but here's what I came up with that seems to work. You could use lapply()
and get()
inside the data table.
d[, paste0("delta_", v.list) := lapply(v.list, function(x) {
s <- sort(grep(x, names(d), fixed = TRUE, value = TRUE))
get(s[1]) - get(s[2])
})]
Alternatively, you could parse and evaluate some expressions.
cols <- lapply(v.list, function(x) {
g <- grep(paste0("p(ost|re)\\.", x), names(d), value = TRUE)
eval(parse(text = paste(g, collapse = "-")), envir = d)
})
d[, paste0("delta_", v.list) := cols]
Upvotes: 3
Reputation: 83265
Another possibility is to transform your data into long format first. With the enhanced melt
function from data.table you can use multiple measures by patterns and as a result create more than one value column (in this case a pre and a post value column):
melt(d, measure.vars = patterns("pre","post"),
value.name = c("height.pre","height.post"))[, variable := v.list[variable]
][, delta_height := height.post - height.pre][]
which gives:
variable height.pre height.post delta_height
1: AM 1.51181796 0.20232291 -1.3094951
2: AM 0.65902517 0.51772371 -0.1413015
3: AM 1.12202807 1.67814321 0.5561151
4: AM -0.78464137 0.38524481 1.1698862
5: AM -0.42569229 -1.28188722 -0.8561949
6: AM 0.39299759 -0.58215074 -0.9751483
7: AM 0.03675713 1.77411869 1.7373616
8: AM -1.03208366 -0.21067198 0.8214117
9: AM -1.26486147 -0.35210691 0.9127546
10: AM -0.22696529 0.58517233 0.8121376
11: EM 0.74558930 1.01368470 0.2680954
12: EM 0.33281918 -0.02256943 -0.3553886
.....
Upvotes: 4
Reputation: 18625
You may want to consider making use of the dplyr
and tidyr
as those packages lend themselves very well for those kind of operations and generate readable and neat workflow code.
Vectorize(require)(package = c("dplyr", "tidyr"),
character.only = TRUE)
dComplete <- d %>%
gather(key = indPre, value = valPre, contains("pre")) %>%
gather(key = indPost, value = valPost, contains("post")) %>%
mutate(diff = valPost - valPre)
I set.seed(1)
for reproducibility:
set.seed1(1)
height.pre.AM <- rnorm(10)
height.pre.EM <- rnorm(10)
height.pre.SMH <- rnorm(10)
height.post.AM <- rnorm(10)
height.post.EM <- rnorm(10)
height.post.SMH <- rnorm(10)
d<- data.frame(height.pre.AM, height.pre.EM, height.pre.SMH,
height.post.AM,height.post.EM,height.post.SMH)
> head(dComplete)
indPre valPre indPost valPost diff
1 height.pre.AM 0.2426995 height.post.AM -1.0155539 -1.2582534
2 height.pre.AM -0.7978763 height.post.AM 0.7602261 1.5581023
3 height.pre.AM -0.2440429 height.post.AM -1.7585200 -1.5144772
4 height.pre.AM -1.4228071 height.post.AM 0.7663306 2.1891377
5 height.pre.AM 1.6237066 height.post.AM 1.0676800 -0.5560266
6 height.pre.AM 0.3561212 height.post.AM -0.4366372 -0.7927584
If desired, you may later spread
your values into one column; depends how you want to use this data.
Upvotes: 2