Reputation: 57
I have a data frame which looks like this
Based on column V1, if the input in V1 = 1, it will be assigned to the treatment group and I will therefore take the result of column Yi(1), if the input in V1 = 0, I will take the result of Yi(0).
Ultimately, I need to do this for all 6 columns, V1 to V6. I have done it on V1, but have trouble expanding the function, could a for-loop do the job? My ultimate aim is to input and store all the diff into a data frame.
t_group <- which(combined_df$V1>0)
c_group <- which(combined_df$V1<=0)
diff <- mean(combined_df$`Yi(1)`[t_group]) - mean(combined_df$`Yi(0)`[c_group])
diff
Upvotes: 0
Views: 1127
Reputation: 4140
Data:
df <- data.frame(Yi0 = runif(n=10, min = 0, max = 100),
Yi1 = runif(n=10, min = 0, max = 100),
V1 = c(1,1,1,1,1,0,0,0,0,0),
V2 = c(0,1,0,1,0,1,0,1,0,1))
df
#> Yi0 Yi1 V1 V2
#> 1 96.92925 78.60610 1 0
#> 2 13.11842 30.26470 1 1
#> 3 57.02284 60.18323 1 0
#> 4 38.96841 51.58151 1 1
#> 5 41.48591 47.38174 1 0
#> 6 91.79000 89.82191 0 1
#> 7 41.61315 25.03463 0 0
#> 8 64.37319 29.87603 0 1
#> 9 10.10971 14.97266 0 0
#> 10 52.21190 52.87794 0 1
Apply your logic as a function to work on one column:
pm <- function(x) {
as.numeric(
ifelse(
test = df[,x] == 0,
yes = df[,"Yi1"],
no = ifelse(
test = df[,x] == 1,
yes = df[,"Yi0"],
no = "error")))
}
Apply that function to all columns of interest:
ab <- sapply(X = 3:4, FUN=pm)
ab
#> [,1] [,2]
#> [1,] 51.980213 8.064173
#> [2,] 30.616215 30.616215
#> [3,] 94.817918 34.882479
#> [4,] 99.261267 99.261267
#> [5,] 32.276397 60.470493
#> [6,] 13.596881 81.908181
#> [7,] 2.975445 2.975445
#> [8,] 56.496649 43.317387
#> [9,] 56.218272 56.218272
#> [10,] 93.022714 1.478989
X
is a vector of column numbers for V1 to V6 and FUN
is the function where your logic is defined.
Calculate difference of means:
mean(ab[,1]) - mean(ab[,2])
#> [1] 11.20691
Calculate difference between means within the same column:
pm2 <- function(x) {
mean(df[df[,x]==1,"Yi1"]) - mean(df[df[,x]==0,"Yi0"])
}
sapply(3:4,pm2)
Upvotes: 1
Reputation: 3671
With dplyr
you can try this:
Data
df <- data.frame(Yi0 = 1:10,
Yi1 = 21:30,
V1 = c(1,1,1,1,1,0,0,0,0,0),
V2 = c(0,1,0,1,0,1,0,1,0,1))
Code
df %>%
summarise(across(V1:V2,
~ mean(df %>%
filter(.x == 1) %>%
pull(Yi1), na.rm = T) -
mean(df %>%
filter(.x == 0) %>%
pull(Yi0), na.rm = T)))
For your data you may edit V1:V2
to V1:V6
.
Output
V1 V2
1 15 21
Upvotes: 1