Reputation: 1499
With the following data frame:
A1 A2 EFF FRQ
A G 0.0125 0.4578
T C 0.0143 0.1293
T C -0.017 0.8984
A G -0.018 0.8945
A G -0.009 0.8652
A G 0.0001 0.3931
I would like to make two probability "draws" from the effect size based on the FRQ
column. I want to make a new column called sim_1
where 45.78% of the time, the EFF
keeps it's sign, and where 54.22% of the time, the EFF
switches it's sign. I would like to then sum two of these random events for each row. For example, let's say that two random numbers 0-100 are generated. 78.33 and 32.16. I will take anything < 45.78 as being indicative of keeping the EFF
the same. Since I randomly rolled a 78 and 32, the sum will be -0.0125 (for the 78.33 roll) and 0.0125 for the (32.16) roll, which equals 0.
In the second row, let's say that I roll two random numbers of 88.22 and 67.10. Because neither of these numbers are below 12.93, then the EFF
sign will be flipped for both the 88.22 and 67.10 roll, leaving us with a sum of -0.0286 (-0.0143 + -0.0143).
I would like to do 500 simulation columns in this manner so that the final output looks like:
A1 A2 EFF FRQ Sim_1 Sim_2 Sim_3...
A G 0.0125 0.4578 0 - -
T C 0.0143 0.1293 -0.0286 - -
T C -0.017 0.8984 - - -
A G -0.018 0.8945 - - -
A G -0.009 0.8652 - - -
A G 0.0001 0.3931 - - -
Note: If you generate an output file, it may not match mine since it is based on randomness.
Upvotes: 1
Views: 575
Reputation: 15708
Using your data:
tmp_df <- structure(list(A1 = structure(c(1L, 2L, 2L, 1L, 1L, 1L),
.Label = c("A", "T"), class = "factor"),
A2 = structure(c(2L, 1L, 1L, 2L, 2L, 2L),
.Label = c("C", "G"), class = "factor"),
EFF = c(0.0125, 0.0143, -0.017, -0.018, -0.009, 1e-04),
FRQ = c(0.4578, 0.1293, 0.8984, 0.8945, 0.8652, 0.3931)),
.Names = c("A1", "A2", "EFF", "FRQ"), class = "data.frame", row.names = c(NA, -6L))
Do the following
set.seed(0)
tmp_results <- lapply(1:500, function(i) rowSums(2 * (0.5 - (matrix(runif(nrow(tmp_df) * 2), ncol = 2) >= tmp_df$FRQ)) * tmp_df$EFF))
tmp_out <- as.data.frame(tmp_results)
names(tmp_out) <- paste("Sim", 1:500)
tmp_out <- cbind(tmp_df, tmp_out)
producing:
> tmp_out[, 1:10]
A1 A2 EFF FRQ Sim 1 Sim 2 Sim 3 Sim 4 Sim 5 Sim 6
1 A G 0.0125 0.4578 -0.0250 0.0000 0.0250 -0.0250 0.0000 0.0250
2 T C 0.0143 0.1293 -0.0286 -0.0286 -0.0286 -0.0286 0.0000 -0.0286
3 T C -0.0170 0.8984 -0.0340 -0.0340 -0.0340 -0.0340 -0.0340 -0.0340
4 A G -0.0180 0.8945 -0.0360 0.0000 -0.0360 -0.0360 -0.0360 -0.0360
5 A G -0.0090 0.8652 0.0000 -0.0180 -0.0180 -0.0180 -0.0180 0.0000
6 A G 0.0001 0.3931 0.0002 -0.0002 -0.0002 0.0000 -0.0002 0.0000
Explanation of the lapply
step:
1) matrix(runif(nrow(tmp_df) * 2)
Draw two columns filled with random numbers drawn uniformly in the interval [0, 1].
Alternatively, you can look into using `rbinom`.
2) 2 * (... >= tmp_df$FRQ) * tmp_df$EFF
Create (-1, 1) indicator to see whether `EFF` should be fliped, then multiply, exploiting conformability rules.
3) lapply(...)
Do the above 500 times.
The rest simply label, and bind the simulated results to your original data.
Upvotes: 2