Reputation: 199
I've been playing with some data in order to obtain the ratios between two levels within one variable and taking into account two other variables. I've been using the function aggregate(), which is very useful to calculate means and sums. However, I'm stuck when I want to calculate some ratios (divisions). Here you find a dataframe very similar to my data:
w<-c("A","B","C","D","E","F","A","B","C","D","E","F")
x<-c(1,1,1,1,1,1,2,2,2,2,2,2)
y<-c(3,4,5,6,8,10,3,4,5,7,9,10)
z<-runif(12)
df<-data.frame(w,x,y,z)
df
w x y z
1 A 1 3 0.93767621
2 B 1 4 0.09169992
3 C 1 5 0.49012926
4 D 1 6 0.90886690
5 E 1 8 0.37058120
6 F 1 10 0.83558267
7 A 2 3 0.42670001
8 B 2 4 0.05656252
9 C 2 5 0.70694423
10 D 2 7 0.13634309
11 E 2 9 0.92065671
12 F 2 10 0.56276176
What I want is to obtain the ratios of z from the two levels of x and taking into account the variables w and y. So the level "A" from the variable "w" in the level "3" from the variable "y" should be:
df$z[1]/df$z[7]
With aggregate function should be something like this:
final<-aggregate(z~y:w, data=df)
However, I know that I miss something because in the variable y there are some classes that not appear in the two categories of w (e.g. 7, 8 and 9).
Any help will be welcomed!
Upvotes: 1
Views: 2231
Reputation: 887088
We can use data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'w', 'y', if
the nrow (.N
) is 2, we divide the first value by the second or else
return the 'z'. Assign (:=
) the output to a new column 'z1'.
library(data.table)
setDT(df)[,z1 :=if(.N==2) z[1]/z[2] else z , by = .(w,y)]
df
# w x y z z1
# 1: A 1 3 0.93767621 2.1975069
# 2: B 1 4 0.09169992 1.6212135
# 3: C 1 5 0.49012926 0.6933068
# 4: D 1 6 0.90886690 0.9088669
# 5: E 1 8 0.37058120 0.3705812
# 6: F 1 10 0.83558267 1.4847894
# 7: A 2 3 0.42670001 2.1975069
# 8: B 2 4 0.05656252 1.6212135
# 9: C 2 5 0.70694423 0.6933068
#10: D 2 7 0.13634309 0.1363431
#11: E 2 9 0.92065671 0.9206567
#12: F 2 10 0.56276176 1.4847894
If we just want the summary
output we don't need to use :=
setDT(df)[, list(z=if(.N==2) z[1]/z[2] else z) , by = .(w,y)]
Or using aggregate
aggregate(z~w+y, df, FUN=function(x)
if(length(x)==2) x[1]/x[2] else x)
Upvotes: 1