I went through many conditional mutating questions on this site but my problems are more complex than those. Here's my data structure: d = matrix(data = NA, ncol = 3, nrow = 9) d = as.data.frame(d) colnames(d) = c('group', 'type', 'v1') d$group = c(1,1,1,2,2,2,2,2,2) d$type = c(1,2,3,1,2,3,3,3,3) d$v1 = c(43,21,234,5,56,6,56,4,345) group type v1 1 1 43 1 2 21 1 3 234 2 1 5 2 2 56 2 3 6 2 3 56 2 3 4 2 3 345 It has two grouping variables: group and type . I need to create a new variable v2 , so that: in each group, if type == 1 , v2 = 1 in each group, if type == 2 , v2 = [v1(type2) - v1(type1)] / [v1(type2) + v1(type1)] . For example, in group 1, when type == 2 , v2 = (21-43) / (21 + 43) in each group, if type == 3 , apply the same function v2 = [v1(type3) - v1(type1)] / [v1(type3) + v1(type1)] . For example, in group1, when type == 3 , v2 = (234 - 43) / (234 + 43) My dataset has more than 200 groups. In each group, frequencies of type 3 are different too. Here's what I did: I created a function of the formula: flsm = function(x, y){(x - y) / (x + y)} And then I try to calculate v2 : d %>% group_by(group) %>% mutate(v2 = ifelse(type == 2, flsm(v1, type == 1[v1])), ifelse(type == 3, flsm(v1, type == 1[v1])), 1) It returned the following warnings: Error: argument "no" is missing, with no default In addition: Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In `==.default`(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 1[c(6.27, :longer object length is not a multiple of shorter object length I feel like I'm not doing it from the right approach. Any idea how to calculate v2 ?

Reputation: 43

Complex conditional mutating

I went through many conditional mutating questions on this site but my problems are more complex than those. Here's my data structure:

d = matrix(data = NA, ncol = 3, nrow = 9)
d = as.data.frame(d)
colnames(d) = c('group', 'type', 'v1')
d$group = c(1,1,1,2,2,2,2,2,2)
d$type = c(1,2,3,1,2,3,3,3,3)
d$v1 = c(43,21,234,5,56,6,56,4,345)


group  type v1
1   1   43  
1   2   21  
1   3   234 
2   1   5   
2   2   56  
2   3   6   
2   3   56  
2   3   4   
2   3   345

It has two grouping variables: group and type. I need to create a new variable v2, so that:

in each group, if type == 1, v2 = 1
in each group, if type == 2, v2 = [v1(type2) - v1(type1)] / [v1(type2) + v1(type1)]. For example, in group 1, when type == 2, v2 = (21-43) / (21 + 43)
in each group, if type == 3, apply the same function v2 = [v1(type3) - v1(type1)] / [v1(type3) + v1(type1)]. For example, in group1, when type == 3, v2 = (234 - 43) / (234 + 43)

My dataset has more than 200 groups. In each group, frequencies of type 3 are different too.

Here's what I did: I created a function of the formula:

flsm = function(x, y){(x - y) / (x + y)}

And then I try to calculate v2:

d %>% group_by(group) %>% 
    mutate(v2 = ifelse(type == 2, 
                       flsm(v1, type == 1[v1])),
                       ifelse(type == 3, flsm(v1, type == 1[v1])), 1)

It returned the following warnings:

Error: argument "no" is missing, with no default
In addition: Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 1[c(6.27,  :longer object length is not a multiple of shorter object length

I feel like I'm not doing it from the right approach. Any idea how to calculate v2?

Upvotes: 1

Answers (3)

akrun

Reputation: 887831

Here is an option with data.table which assigns in place

library(data.table)
setDT(d)[,  v2:= flsm(v1, d$v1[d$group==unique(group) & d$type ==1]) , group
                     ][type==1, v2 := 1][]
#   group type  v1          v2
#1:     1    1  43  1.00000000
#2:     1    2  21 -0.34375000
#3:     1    3 234  0.68953069
#4:     2    1   5  1.00000000
#5:     2    2  56  0.83606557
#6:     2    3   6  0.09090909
#7:     2    3  56  0.83606557
#8:     2    3   4 -0.11111111
#9:     2    3 345  0.97142857

Upvotes: 0

alistaire

Reputation: 43354

You're trying some weird subsetting, indexing 1 by v1. Going by your descriptions instead, you can use the bare column name to refer to the variable within the group and .$column_name to refer to the entire column, which lets you do:

d %>% group_by(group) %>% 
    mutate(v2 = ifelse(type == 1, 1, 
                       flsm(v1, .$v1[.$group == unique(group) & .$type == 1])))

## Source: local data frame [9 x 4]
## Groups: group [2]
## 
##   group  type    v1          v2
##   <int> <int> <int>       <dbl>
## 1     1     1    43  1.00000000
## 2     1     2    21 -0.34375000
## 3     1     3   234  0.68953069
## 4     2     1     5  1.00000000
## 5     2     2    56  0.83606557
## 6     2     3     6  0.09090909
## 7     2     3    56  0.83606557
## 8     2     3     4 -0.11111111
## 9     2     3   345  0.97142857

Upvotes: 2

Hack-R

Reputation: 23231

Here's how to do it in base R. From here if you want to use a package to do the same thing it should be straightforward.

df1$v2 <- NA

 for(i in df1$gr){
  #in each group, if tye==1, v2=1
  df1$v2[df1$tye==1 & df1$gr==i] <- 1

  #in each group, if tye==2, v2=[v1(tye2)-v1(tye1)]/[v1(tye2)+v1(tye1)]. 
  df1$v2[df1$tye==2 & df1$gr==i] <- (df1$v1[df1$tye==2 & df1$gr==i] - df1$v1[df1$tye==1 & df1$gr==i])/(df1$v1[df1$tye==2 & df1$gr==i]
                                                                    +df1$v1[df1$tye==1 & df1$gr==i])



  #in each group, if tye==3, apply the same function v2=[v1(tye3)-v1(tye1)]/[v1(tye3)+v1(tye1)]. 
  df1$v2[df1$tye==3 & df1$gr==i] <- (df1$v1[df1$tye==3 & df1$gr==i] - df1$v1[df1$tye==1 & df1$gr==i])/(df1$v1[df1$tye==3 & df1$gr==i]
                                             +df1$v1[df1$tye==1 & df1$gr==i])  
}

  gr tye  v1          v2
1  1   1  43  1.00000000
2  1   2  21 -0.34375000
3  1   3 234  0.68953069
4  2   1   5  1.00000000
5  2   2  56  0.83606557
6  2   3   6  0.09090909
7  2   3  56  0.83606557
8  2   3   4 -0.11111111
9  2   3 345  0.97142857

Upvotes: 1

Complex conditional mutating

Answers (3)

Related Questions