Reputation: 21400
My data is a dataframe with columns for word durations (columns d1, d2, etc.) and phonemic size (columns p1, p2, etc.), like this:
df <- data.frame(
d1 = rnorm(10),
d2 = rnorm(10, 0.2),
d3 = rnorm(10, 0.5),
d4 = rnorm(10, 1),
p1 = sample(1:7, 10, replace = T),
p2 = sample(1:7, 10, replace = T),
p3 = sample(1:7, 10, replace = T),
p4 = sample(1:7, 10, replace = T)
)
df
What I'd like to compute is the ratio of the values in d1 divided by the values in p1, d2 by p2, etc. Of course it can be done for each 'pair' of variables separately, like this:
df$dp1 <- df$d1 / df$p1
df$dp2 <- df$d2 / df$p2
df$dp3 <- df$d3 / df$p3
df$dp4 <- df$d4 / df$p4
But as I have not just 4 pairs but many more pairs in the actual data it is cumbersome to do and repetitive. So, is there a way to obtain the ratios in one go--one piece of code--in base R? Alternatively, instead of including the ratios as new variables in the original df, they could be stored in a separate df.
Upvotes: 0
Views: 35
Reputation: 1364
You can do the following. The solution is not sensitive to the number of respective p# and d# columns.
Code
ds = colnames(df)[colnames(df) %like% 'd'] # all d cols
ps = colnames(df)[colnames(df) %like% 'p'] # all p cols
mat = lapply(ds, function(x){
data.frame(sapply(ps, function(y){df[[x]]/df[[y]]}))
})
names_full = paste(sapply(ds, function(x) paste0(x, ps)))
master = Reduce(function(...) cbind(...), mat); colnames(master) = names_full
Result
> head(master)
d1p1 d1p2 d1p3 d1p4 d2p1 d2p2 d2p3 d2p4 d3p1 d3p2 d3p3 d3p4 d4p1 d4p2
1 -0.78447758 -0.26149253 -0.26149253 -0.15689552 -0.19813960 -0.06604653 -0.06604653 -0.03962792 -0.078350379 -0.0261167931 -0.026116793 -0.0156700759 0.63007362 0.2100245
2 -0.11150154 -0.04778637 -0.08362615 -0.05575077 0.32100537 0.13757373 0.24075402 0.16050268 0.001697144 0.0007273474 0.001272858 0.0008485719 0.63775604 0.2733240
3 0.09862042 0.34517146 0.11505715 0.23011431 -0.20521580 -0.71825529 -0.23941843 -0.47883686 0.218074242 0.7632598470 0.254419949 0.5088398980 -0.07795236 -0.2728333
4 0.27003580 0.81010741 0.32404297 0.40505371 -0.02041131 -0.06123392 -0.02449357 -0.03061696 0.245441958 0.7363258734 0.294530349 0.3681629367 0.31493905 0.9448172
5 -0.09385691 -0.05631415 -0.09385691 -0.05631415 -0.25370631 -0.15222379 -0.25370631 -0.15222379 -0.151414384 -0.0908486306 -0.151414384 -0.0908486306 0.19906969 0.1194418
6 -0.11090289 -0.08317717 -0.05545144 -0.06654173 0.11813430 0.08860072 0.05906715 0.07088058 0.358632072 0.2689740541 0.179316036 0.2151792433 0.58050348 0.4353776
d4p3 d4p4
1 0.21002454 0.1260147
2 0.47831703 0.3188780
3 -0.09094442 -0.1818888
4 0.37792686 0.4724086
5 0.19906969 0.1194418
6 0.29025174 0.3483021
The %like%
operator is from the data.table package.
Edit
So evidently the cross divisions weren't necessary. Consider them an extra ;). Reduced solution below.
ds = colnames(df)[colnames(df) %like% 'd'] # all d cols
ps = colnames(df)[colnames(df) %like% 'p'] # all p cols
namestot = paste0(ds, ps)
mat = df[, ds] / df[, ps]; colnames(mat) = namestot
> mat
d1p1 d2p2 d3p3 d4p4
1 -0.40484538 0.10443586 -0.02781059 0.06541699
2 0.38268519 -0.08514658 -1.00317641 0.65820613
3 -0.43688685 0.65931482 0.42006917 1.64296707
4 -0.30461343 -0.32322309 0.27494661 0.65208960
5 -0.11160969 0.19414685 0.06839209 0.11104689
6 -0.14843616 -0.11294288 0.03290482 0.37455888
7 -0.40149747 0.19491568 0.78079991 0.82040680
8 -0.05682883 -0.38944966 0.33275446 1.76767351
9 0.01234991 0.77042995 0.22883848 1.54698057
10 -0.11590977 0.30632659 0.83303798 0.27070012
Upvotes: 1