Evan
Evan

Reputation: 1499

How to make a probability simulator in R?

With the following data frame:

A1  A2  EFF       FRQ      
A   G   0.0125    0.4578  
T   C   0.0143    0.1293    
T   C   -0.017    0.8984  
A   G   -0.018    0.8945   
A   G   -0.009    0.8652   
A   G   0.0001    0.3931   

I would like to make two probability "draws" from the effect size based on the FRQ column. I want to make a new column called sim_1 where 45.78% of the time, the EFF keeps it's sign, and where 54.22% of the time, the EFF switches it's sign. I would like to then sum two of these random events for each row. For example, let's say that two random numbers 0-100 are generated. 78.33 and 32.16. I will take anything < 45.78 as being indicative of keeping the EFF the same. Since I randomly rolled a 78 and 32, the sum will be -0.0125 (for the 78.33 roll) and 0.0125 for the (32.16) roll, which equals 0.

In the second row, let's say that I roll two random numbers of 88.22 and 67.10. Because neither of these numbers are below 12.93, then the EFF sign will be flipped for both the 88.22 and 67.10 roll, leaving us with a sum of -0.0286 (-0.0143 + -0.0143).

I would like to do 500 simulation columns in this manner so that the final output looks like:

A1  A2  EFF       FRQ      Sim_1   Sim_2   Sim_3...
A   G   0.0125    0.4578   0       -       -
T   C   0.0143    0.1293   -0.0286 -       -
T   C   -0.017    0.8984  -        -       -
A   G   -0.018    0.8945  -        -       -
A   G   -0.009    0.8652  -        -       -
A   G   0.0001    0.3931  -        -       -

Note: If you generate an output file, it may not match mine since it is based on randomness.

Upvotes: 1

Views: 575

Answers (1)

Alex
Alex

Reputation: 15708

Using your data:

tmp_df <- structure(list(A1 = structure(c(1L, 2L, 2L, 1L, 1L, 1L), 
                                        .Label = c("A", "T"), class = "factor"), 
                         A2 = structure(c(2L, 1L, 1L, 2L, 2L, 2L),
                                        .Label = c("C", "G"), class = "factor"), 
                         EFF = c(0.0125, 0.0143, -0.017, -0.018, -0.009, 1e-04), 
                         FRQ = c(0.4578, 0.1293, 0.8984, 0.8945, 0.8652, 0.3931)),
                    .Names = c("A1", "A2", "EFF", "FRQ"), class = "data.frame", row.names = c(NA, -6L))

Do the following

set.seed(0)

tmp_results <- lapply(1:500, function(i) rowSums(2 * (0.5 - (matrix(runif(nrow(tmp_df) * 2), ncol = 2) >= tmp_df$FRQ)) * tmp_df$EFF))



tmp_out <- as.data.frame(tmp_results)
names(tmp_out) <- paste("Sim", 1:500)

tmp_out <- cbind(tmp_df, tmp_out)

producing:

> tmp_out[, 1:10]
  A1 A2     EFF    FRQ   Sim 1   Sim 2   Sim 3   Sim 4   Sim 5   Sim 6
1  A  G  0.0125 0.4578 -0.0250  0.0000  0.0250 -0.0250  0.0000  0.0250
2  T  C  0.0143 0.1293 -0.0286 -0.0286 -0.0286 -0.0286  0.0000 -0.0286
3  T  C -0.0170 0.8984 -0.0340 -0.0340 -0.0340 -0.0340 -0.0340 -0.0340
4  A  G -0.0180 0.8945 -0.0360  0.0000 -0.0360 -0.0360 -0.0360 -0.0360
5  A  G -0.0090 0.8652  0.0000 -0.0180 -0.0180 -0.0180 -0.0180  0.0000
6  A  G  0.0001 0.3931  0.0002 -0.0002 -0.0002  0.0000 -0.0002  0.0000

Explanation of the lapply step:

1) matrix(runif(nrow(tmp_df) * 2)
Draw two columns filled with random numbers drawn uniformly in the interval [0, 1].
Alternatively, you can look into using `rbinom`.

2) 2 * (... >= tmp_df$FRQ) * tmp_df$EFF
Create (-1, 1) indicator to see whether `EFF` should be fliped, then multiply, exploiting conformability rules.

3) lapply(...) 
Do the above 500 times.

The rest simply label, and bind the simulated results to your original data.

Upvotes: 2

Related Questions