ppotatomato
ppotatomato

Reputation: 57

R loop to do calculation based on condition from another column

I have a data frame which looks like this

[1]: https://i.sstatic.net/EfIy2.png

Based on column V1, if the input in V1 = 1, it will be assigned to the treatment group and I will therefore take the result of column Yi(1), if the input in V1 = 0, I will take the result of Yi(0).

Ultimately, I need to do this for all 6 columns, V1 to V6. I have done it on V1, but have trouble expanding the function, could a for-loop do the job? My ultimate aim is to input and store all the diff into a data frame.

    t_group <- which(combined_df$V1>0)
c_group <- which(combined_df$V1<=0)

diff <- mean(combined_df$`Yi(1)`[t_group]) - mean(combined_df$`Yi(0)`[c_group])
diff

Upvotes: 0

Views: 1127

Answers (2)

Skaqqs
Skaqqs

Reputation: 4140

Data:

df <- data.frame(Yi0 =  runif(n=10, min = 0, max = 100),
                 Yi1 = runif(n=10, min = 0, max = 100),
                 V1 = c(1,1,1,1,1,0,0,0,0,0),
                 V2 = c(0,1,0,1,0,1,0,1,0,1))
df
#>         Yi0      Yi1 V1 V2
#> 1  96.92925 78.60610  1  0
#> 2  13.11842 30.26470  1  1
#> 3  57.02284 60.18323  1  0
#> 4  38.96841 51.58151  1  1
#> 5  41.48591 47.38174  1  0
#> 6  91.79000 89.82191  0  1
#> 7  41.61315 25.03463  0  0
#> 8  64.37319 29.87603  0  1
#> 9  10.10971 14.97266  0  0
#> 10 52.21190 52.87794  0  1

Apply your logic as a function to work on one column:

pm <- function(x) {
  as.numeric(
    ifelse(
      test = df[,x] == 0,
      yes = df[,"Yi1"],
      no = ifelse(
        test = df[,x] == 1,
        yes = df[,"Yi0"],
        no = "error")))
  } 

Apply that function to all columns of interest:

ab <- sapply(X = 3:4, FUN=pm)
ab
#>            [,1]      [,2]
#>  [1,] 51.980213  8.064173
#>  [2,] 30.616215 30.616215
#>  [3,] 94.817918 34.882479
#>  [4,] 99.261267 99.261267
#>  [5,] 32.276397 60.470493
#>  [6,] 13.596881 81.908181
#>  [7,]  2.975445  2.975445
#>  [8,] 56.496649 43.317387
#>  [9,] 56.218272 56.218272
#> [10,] 93.022714  1.478989

X is a vector of column numbers for V1 to V6 and FUN is the function where your logic is defined.

Calculate difference of means:

mean(ab[,1]) - mean(ab[,2])
#> [1] 11.20691

Edit

Calculate difference between means within the same column:

pm2 <- function(x) {
  mean(df[df[,x]==1,"Yi1"]) - mean(df[df[,x]==0,"Yi0"])
}

sapply(3:4,pm2)

Upvotes: 1

tamtam
tamtam

Reputation: 3671

With dplyr you can try this:

Data

df <- data.frame(Yi0 =  1:10,
                 Yi1 = 21:30,
                 V1 = c(1,1,1,1,1,0,0,0,0,0),
                 V2 = c(0,1,0,1,0,1,0,1,0,1))

Code

df %>%
  summarise(across(V1:V2, 
                ~ mean(df %>% 
                         filter(.x == 1) %>%
                         pull(Yi1), na.rm = T) -
                  mean(df %>% 
                         filter(.x == 0) %>%
                         pull(Yi0), na.rm = T)))

For your data you may edit V1:V2 to V1:V6.
Output

  V1 V2
1 15 21

Upvotes: 1

Related Questions