R computing ratios from multiple related variables in one piece of code

Question

My data is a dataframe with columns for word durations (columns d1, d2, etc.) and phonemic size (columns p1, p2, etc.), like this:

df <- data.frame(
  d1 = rnorm(10),
  d2 = rnorm(10, 0.2),
  d3 = rnorm(10, 0.5),
  d4 = rnorm(10, 1),
  p1 = sample(1:7, 10, replace = T), 
  p2 = sample(1:7, 10, replace = T),
  p3 = sample(1:7, 10, replace = T),
  p4 = sample(1:7, 10, replace = T)
)
df

What I'd like to compute is the ratio of the values in d1 divided by the values in p1, d2 by p2, etc. Of course it can be done for each 'pair' of variables separately, like this:

df$dp1 <- df$d1 / df$p1
df$dp2 <- df$d2 / df$p2
df$dp3 <- df$d3 / df$p3
df$dp4 <- df$d4 / df$p4

But as I have not just 4 pairs but many more pairs in the actual data it is cumbersome to do and repetitive. So, is there a way to obtain the ratios in one go--one piece of code--in base R? Alternatively, instead of including the ratios as new variables in the original df, they could be stored in a separate df.

JDG · Accepted Answer

You can do the following. The solution is not sensitive to the number of respective p# and d# columns.

Code

ds = colnames(df)[colnames(df) %like% 'd'] # all d cols
ps = colnames(df)[colnames(df) %like% 'p'] # all p cols

mat = lapply(ds, function(x){
  data.frame(sapply(ps, function(y){df[[x]]/df[[y]]}))
})

names_full = paste(sapply(ds, function(x) paste0(x, ps)))
master = Reduce(function(...) cbind(...), mat); colnames(master) = names_full

Result

> head(master)
         d1p1        d1p2        d1p3        d1p4        d2p1        d2p2        d2p3        d2p4         d3p1          d3p2         d3p3          d3p4        d4p1       d4p2
1 -0.78447758 -0.26149253 -0.26149253 -0.15689552 -0.19813960 -0.06604653 -0.06604653 -0.03962792 -0.078350379 -0.0261167931 -0.026116793 -0.0156700759  0.63007362  0.2100245
2 -0.11150154 -0.04778637 -0.08362615 -0.05575077  0.32100537  0.13757373  0.24075402  0.16050268  0.001697144  0.0007273474  0.001272858  0.0008485719  0.63775604  0.2733240
3  0.09862042  0.34517146  0.11505715  0.23011431 -0.20521580 -0.71825529 -0.23941843 -0.47883686  0.218074242  0.7632598470  0.254419949  0.5088398980 -0.07795236 -0.2728333
4  0.27003580  0.81010741  0.32404297  0.40505371 -0.02041131 -0.06123392 -0.02449357 -0.03061696  0.245441958  0.7363258734  0.294530349  0.3681629367  0.31493905  0.9448172
5 -0.09385691 -0.05631415 -0.09385691 -0.05631415 -0.25370631 -0.15222379 -0.25370631 -0.15222379 -0.151414384 -0.0908486306 -0.151414384 -0.0908486306  0.19906969  0.1194418
6 -0.11090289 -0.08317717 -0.05545144 -0.06654173  0.11813430  0.08860072  0.05906715  0.07088058  0.358632072  0.2689740541  0.179316036  0.2151792433  0.58050348  0.4353776
         d4p3       d4p4
1  0.21002454  0.1260147
2  0.47831703  0.3188780
3 -0.09094442 -0.1818888
4  0.37792686  0.4724086
5  0.19906969  0.1194418
6  0.29025174  0.3483021

The %like% operator is from the data.table package.

Edit

So evidently the cross divisions weren't necessary. Consider them an extra ;). Reduced solution below.

ds = colnames(df)[colnames(df) %like% 'd'] # all d cols
ps = colnames(df)[colnames(df) %like% 'p'] # all p cols
namestot = paste0(ds, ps)

mat = df[, ds] / df[, ps]; colnames(mat) = namestot

> mat
          d1p1        d2p2        d3p3       d4p4
1  -0.40484538  0.10443586 -0.02781059 0.06541699
2   0.38268519 -0.08514658 -1.00317641 0.65820613
3  -0.43688685  0.65931482  0.42006917 1.64296707
4  -0.30461343 -0.32322309  0.27494661 0.65208960
5  -0.11160969  0.19414685  0.06839209 0.11104689
6  -0.14843616 -0.11294288  0.03290482 0.37455888
7  -0.40149747  0.19491568  0.78079991 0.82040680
8  -0.05682883 -0.38944966  0.33275446 1.76767351
9   0.01234991  0.77042995  0.22883848 1.54698057
10 -0.11590977  0.30632659  0.83303798 0.27070012

R computing ratios from multiple related variables in one piece of code

Answers (1)

Related Questions