Reputation: 43
I went through many conditional mutating questions on this site but my problems are more complex than those. Here's my data structure:
d = matrix(data = NA, ncol = 3, nrow = 9)
d = as.data.frame(d)
colnames(d) = c('group', 'type', 'v1')
d$group = c(1,1,1,2,2,2,2,2,2)
d$type = c(1,2,3,1,2,3,3,3,3)
d$v1 = c(43,21,234,5,56,6,56,4,345)
group type v1
1 1 43
1 2 21
1 3 234
2 1 5
2 2 56
2 3 6
2 3 56
2 3 4
2 3 345
It has two grouping variables: group
and type
. I need to create a new variable v2
, so that:
in each group, if type == 1
, v2 = 1
in each group, if type == 2
, v2 = [v1(type2) - v1(type1)] / [v1(type2) + v1(type1)]
. For example, in group 1, when type == 2
, v2 = (21-43) / (21 + 43)
in each group, if type == 3
, apply the same function v2 = [v1(type3) - v1(type1)] / [v1(type3) + v1(type1)]
. For example, in group1, when type == 3
, v2 = (234 - 43) / (234 + 43)
My dataset has more than 200 groups. In each group, frequencies of type 3 are different too.
Here's what I did: I created a function of the formula:
flsm = function(x, y){(x - y) / (x + y)}
And then I try to calculate v2
:
d %>% group_by(group) %>%
mutate(v2 = ifelse(type == 2,
flsm(v1, type == 1[v1])),
ifelse(type == 3, flsm(v1, type == 1[v1])), 1)
It returned the following warnings:
Error: argument "no" is missing, with no default
In addition: Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 1[c(6.27, :longer object length is not a multiple of shorter object length
I feel like I'm not doing it from the right approach. Any idea how to calculate v2
?
Upvotes: 1
Views: 116
Reputation: 886938
Here is an option with data.table
which assigns in place
library(data.table)
setDT(d)[, v2:= flsm(v1, d$v1[d$group==unique(group) & d$type ==1]) , group
][type==1, v2 := 1][]
# group type v1 v2
#1: 1 1 43 1.00000000
#2: 1 2 21 -0.34375000
#3: 1 3 234 0.68953069
#4: 2 1 5 1.00000000
#5: 2 2 56 0.83606557
#6: 2 3 6 0.09090909
#7: 2 3 56 0.83606557
#8: 2 3 4 -0.11111111
#9: 2 3 345 0.97142857
Upvotes: 0
Reputation: 43334
You're trying some weird subsetting, indexing 1 by v1
. Going by your descriptions instead, you can use the bare column name to refer to the variable within the group and .$column_name
to refer to the entire column, which lets you do:
d %>% group_by(group) %>%
mutate(v2 = ifelse(type == 1, 1,
flsm(v1, .$v1[.$group == unique(group) & .$type == 1])))
## Source: local data frame [9 x 4]
## Groups: group [2]
##
## group type v1 v2
## <int> <int> <int> <dbl>
## 1 1 1 43 1.00000000
## 2 1 2 21 -0.34375000
## 3 1 3 234 0.68953069
## 4 2 1 5 1.00000000
## 5 2 2 56 0.83606557
## 6 2 3 6 0.09090909
## 7 2 3 56 0.83606557
## 8 2 3 4 -0.11111111
## 9 2 3 345 0.97142857
Upvotes: 2
Reputation: 23216
Here's how to do it in base R. From here if you want to use a package to do the same thing it should be straightforward.
df1$v2 <- NA
for(i in df1$gr){
#in each group, if tye==1, v2=1
df1$v2[df1$tye==1 & df1$gr==i] <- 1
#in each group, if tye==2, v2=[v1(tye2)-v1(tye1)]/[v1(tye2)+v1(tye1)].
df1$v2[df1$tye==2 & df1$gr==i] <- (df1$v1[df1$tye==2 & df1$gr==i] - df1$v1[df1$tye==1 & df1$gr==i])/(df1$v1[df1$tye==2 & df1$gr==i]
+df1$v1[df1$tye==1 & df1$gr==i])
#in each group, if tye==3, apply the same function v2=[v1(tye3)-v1(tye1)]/[v1(tye3)+v1(tye1)].
df1$v2[df1$tye==3 & df1$gr==i] <- (df1$v1[df1$tye==3 & df1$gr==i] - df1$v1[df1$tye==1 & df1$gr==i])/(df1$v1[df1$tye==3 & df1$gr==i]
+df1$v1[df1$tye==1 & df1$gr==i])
}
gr tye v1 v2 1 1 1 43 1.00000000 2 1 2 21 -0.34375000 3 1 3 234 0.68953069 4 2 1 5 1.00000000 5 2 2 56 0.83606557 6 2 3 6 0.09090909 7 2 3 56 0.83606557 8 2 3 4 -0.11111111 9 2 3 345 0.97142857
Upvotes: 1