Rinot
Rinot

Reputation: 199

ratios according to two variables, function aggregate in R?

I've been playing with some data in order to obtain the ratios between two levels within one variable and taking into account two other variables. I've been using the function aggregate(), which is very useful to calculate means and sums. However, I'm stuck when I want to calculate some ratios (divisions). Here you find a dataframe very similar to my data:

w<-c("A","B","C","D","E","F","A","B","C","D","E","F")
x<-c(1,1,1,1,1,1,2,2,2,2,2,2)
y<-c(3,4,5,6,8,10,3,4,5,7,9,10)
z<-runif(12)
df<-data.frame(w,x,y,z)
df
    w   x   y   z
1   A   1   3   0.93767621
2   B   1   4   0.09169992
3   C   1   5   0.49012926
4   D   1   6   0.90886690
5   E   1   8   0.37058120
6   F   1   10  0.83558267
7   A   2   3   0.42670001
8   B   2   4   0.05656252
9   C   2   5   0.70694423
10  D   2   7   0.13634309
11  E   2   9   0.92065671
12  F   2   10  0.56276176

What I want is to obtain the ratios of z from the two levels of x and taking into account the variables w and y. So the level "A" from the variable "w" in the level "3" from the variable "y" should be:

df$z[1]/df$z[7]

With aggregate function should be something like this:

final<-aggregate(z~y:w, data=df)

However, I know that I miss something because in the variable y there are some classes that not appear in the two categories of w (e.g. 7, 8 and 9).

Any help will be welcomed!

Upvotes: 1

Views: 2231

Answers (1)

akrun
akrun

Reputation: 887088

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'w', 'y', if the nrow (.N) is 2, we divide the first value by the second or else return the 'z'. Assign (:=) the output to a new column 'z1'.

library(data.table)
setDT(df)[,z1 :=if(.N==2) z[1]/z[2] else z , by = .(w,y)]
df
#    w x  y          z        z1
# 1: A 1  3 0.93767621 2.1975069
# 2: B 1  4 0.09169992 1.6212135
# 3: C 1  5 0.49012926 0.6933068
# 4: D 1  6 0.90886690 0.9088669
# 5: E 1  8 0.37058120 0.3705812
# 6: F 1 10 0.83558267 1.4847894
# 7: A 2  3 0.42670001 2.1975069
# 8: B 2  4 0.05656252 1.6212135
# 9: C 2  5 0.70694423 0.6933068
#10: D 2  7 0.13634309 0.1363431
#11: E 2  9 0.92065671 0.9206567
#12: F 2 10 0.56276176 1.4847894

If we just want the summary output we don't need to use :=

setDT(df)[, list(z=if(.N==2) z[1]/z[2] else z) , by = .(w,y)]

Or using aggregate

aggregate(z~w+y, df, FUN=function(x)
            if(length(x)==2) x[1]/x[2] else x)

Upvotes: 1

Related Questions