colin
colin

Reputation: 2666

simple for-loop or lapply solution needed

I have a set a list of factors that looks like this:

v.list <- c('AM','EM','SMH')

I would like to calculate new columns in a data set based on previous columns in the data set, that all are unqiuely defined by one of these identifiers. Here are some example data to illustrate:

height.pre.AM   <- rnorm(10)
height.pre.EM   <- rnorm(10)
height.pre.SMH  <- rnorm(10)
height.post.AM  <- rnorm(10)
height.post.EM  <- rnorm(10)
height.post.SMH <- rnorm(10)
d<- data.table(height.pre.AM,height.pre.EM,height.pre.SMH,height.post.AM,height.post.EM,height.post.SMH)

I would then like to calculate 3 new vectors, the change in height between pre and post vectors, by each identifier in the list. I can do this with 3 lines that look like this:

d[,delta_EM  := height.post.EM  - height.pre.EM ]
d[,delta_AM  := height.post.AM  - height.pre.AM ]
d[,delta_SMH := height.post.SMH - height.pre.SMH]

How can I do this with a single line, and a vector stored as v.list is above?

I tried a for loop constructed as:

for(i in 1:  length(v.list)){
  v   <- (noquote(paste(v.list[i]))) 
  pre <- paste("d[,delta_",v,":= height.post.",v," - height.pre.",v,"]",sep="")
  cat(noquote(pre), sep="\n")
}

However, this just prints the lines, rather than executing them.

Upvotes: 1

Views: 110

Answers (5)

eddi
eddi

Reputation: 49448

Here you go:

for (v in v.list)
  d[, paste0('delta_', v) := get(paste0('height.post.', v)) -
                             get(paste0('height.pre.', v))]

Upvotes: 3

andrewelamb
andrewelamb

Reputation: 19

Here's one way:

library(dplyr)

calculate_delta <-function(df, id){
    calc_string <- paste('height.post.', id, ' - height.pre.', id, sep = '')
    vector <- mutate_(df, 'delta' = calc_string)$delta
}


vector_list <- setNames(lapply(v.list, function(x) calculate_delta(d, x)), v.list)

You could cram it into one line if you wanted, but that wouldn't be very readable.

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99361

There is probably a better way, but here's what I came up with that seems to work. You could use lapply() and get() inside the data table.

d[, paste0("delta_", v.list) := lapply(v.list, function(x) {
    s <- sort(grep(x, names(d), fixed = TRUE, value = TRUE)) 
    get(s[1]) - get(s[2]) 
})]

Alternatively, you could parse and evaluate some expressions.

cols <- lapply(v.list, function(x) {
    g <- grep(paste0("p(ost|re)\\.", x), names(d), value = TRUE)
    eval(parse(text = paste(g, collapse = "-")), envir = d)
})

d[, paste0("delta_", v.list) := cols]

Upvotes: 3

Jaap
Jaap

Reputation: 83265

Another possibility is to transform your data into long format first. With the enhanced melt function from data.table you can use multiple measures by patterns and as a result create more than one value column (in this case a pre and a post value column):

melt(d, measure.vars = patterns("pre","post"), 
     value.name = c("height.pre","height.post"))[, variable := v.list[variable]
                                                 ][, delta_height := height.post - height.pre][]

which gives:

    variable  height.pre height.post delta_height
 1:       AM  1.51181796  0.20232291   -1.3094951
 2:       AM  0.65902517  0.51772371   -0.1413015
 3:       AM  1.12202807  1.67814321    0.5561151
 4:       AM -0.78464137  0.38524481    1.1698862
 5:       AM -0.42569229 -1.28188722   -0.8561949
 6:       AM  0.39299759 -0.58215074   -0.9751483
 7:       AM  0.03675713  1.77411869    1.7373616
 8:       AM -1.03208366 -0.21067198    0.8214117
 9:       AM -1.26486147 -0.35210691    0.9127546
10:       AM -0.22696529  0.58517233    0.8121376
11:       EM  0.74558930  1.01368470    0.2680954
12:       EM  0.33281918 -0.02256943   -0.3553886
.....

Upvotes: 4

Konrad
Konrad

Reputation: 18625

You may want to consider making use of the dplyr and tidyr as those packages lend themselves very well for those kind of operations and generate readable and neat workflow code.

Vectorize(require)(package = c("dplyr", "tidyr"),
                   character.only = TRUE)
dComplete <- d %>%
    gather(key = indPre, value = valPre, contains("pre")) %>% 
    gather(key = indPost, value = valPost, contains("post")) %>% 
    mutate(diff = valPost - valPre)

Preview

Data

I set.seed(1) for reproducibility:

Original data

set.seed1(1)
height.pre.AM   <- rnorm(10)
height.pre.EM   <- rnorm(10)
height.pre.SMH  <- rnorm(10)
height.post.AM  <- rnorm(10)
height.post.EM  <- rnorm(10)
height.post.SMH <- rnorm(10)
d<- data.frame(height.pre.AM, height.pre.EM, height.pre.SMH,
               height.post.AM,height.post.EM,height.post.SMH)

Results preview

> head(dComplete)
         indPre     valPre        indPost    valPost       diff
1 height.pre.AM  0.2426995 height.post.AM -1.0155539 -1.2582534
2 height.pre.AM -0.7978763 height.post.AM  0.7602261  1.5581023
3 height.pre.AM -0.2440429 height.post.AM -1.7585200 -1.5144772
4 height.pre.AM -1.4228071 height.post.AM  0.7663306  2.1891377
5 height.pre.AM  1.6237066 height.post.AM  1.0676800 -0.5560266
6 height.pre.AM  0.3561212 height.post.AM -0.4366372 -0.7927584

If desired, you may later spread your values into one column; depends how you want to use this data.

Upvotes: 2

Related Questions