user1357015
user1357015

Reputation: 11686

Creating a distribution from simulated samples in matrix format

I have a matrix of statistics, called T_{i,j}. I then simulated 1000 samples. I would like to use the 1000 samples to build a distribution and then calculate a p-value for my observed T_{i,j}.

A sample T_{i,j} matrix looks like this:

         V12         V13        V22        V23       V117       V146
V12  0.009900990 0.008281829 0.01490863 0.01548161 0.01342882 0.01287918
V13  0.008281829 0.031250000 0.04367911 0.04597988 0.03876530 0.03182001
V22  0.014908629 0.043679113 0.50000000 0.36522152 0.45404452 0.09666729
V23  0.015481606 0.045979882 0.36522152 0.50000000 0.47827009 0.10272845
V117 0.013428819 0.038765301 0.45404452 0.47827009 0.50000000 0.09810254
V146 0.012879176 0.031820011 0.09666729 0.10272845 0.09810254 0.09090909

What I would like to do is to easily get p-values for each possible entry. In the above matrix there are 21 separate statistics as everything below the diagonal is just the transpose of everything above.

I realize I can go in with for loops to look at each (i,j) entry over all the samples, sort them and then figure out where my observed lies, but I was wondering perhaps there is an easier R way to do it?

I have put a sample set of data here (data outputted via dput): http://temp-share.com/show/3YgF5Ww2x

Upvotes: 0

Views: 284

Answers (1)

ndoogan
ndoogan

Reputation: 1925

If LT is a list of some number of null hypothesis simulated matrices like T, and you want to do a one-tailed test (say, above), then you can count the number of times each element of T fell above or equal to the associated value in the simulation. I'm using reduce to sum the 1000 matrices returned from the lapply.

ct <- Reduce("+", lapply(LT,function(x) x >= T))

The result is a matrix of the same size as T that counts (ct) the number of times the elements of T were exceeded by (or were equal to) the corresponding elements of the matrices in LT. Divide this matrix by the simulation sample size (number of simulations).

p <- ct / length(LT)

p is a matrix of approximate p values representing the probability that a null hypothesis simulation is at least as extreme (on the upper end) as the observed data. If any p < alpha, then you might say the null hypothesis is a poor model in the case of the specific element of the observation.

Adjust the test "x >= T" to run the test you actually want to run, which could be a two-sided test.

Upvotes: 2

Related Questions