MastRofDsastR
MastRofDsastR

Reputation: 161

How to plot a matrix of p-values into a histogram

I've been searching online for how to plot a histogram of values stored in a matrix, but I am having some trouble doing so. I have created a forval loop, where I have stored p-values for 1000 trials of a test, and I want to now plot these p-values on a histogram.

/* Loop generating 1000 trials and storing p-values */

mata: pvalue1000 = J(1000,1,.)

forvalues i = 1/1000 {

    clear
    quiet set obs 1000
    gen n = _n
    quiet gen A = runiform()
    quiet ttest A = 0.20

    /*store the mean, in a local variable*/
    local pvalue = r(p)
    gen pval = r(p)

    /*transfer the p-value from the "local" to the matrix */
    mata: pvalue1000[`i',1] = `pvalue'
}

mata: pvalue1000
hist pvalue1000

The hist pvalue1000 in this case, is saying that pvalue1000 is not found, and when I try to do hist pval it just only displays one p-value in the histogram (I am assuming this is because it is outside the loop).

Also note, that the matrix is only storing p-values and all the p-values are stored in a single column (which has 1000 rows). So the matrix is of size 1 column and 1000 rows.

So how would I be able to call a variable with hist, where it will plot all of the p-values on this histogram?

Upvotes: 0

Views: 837

Answers (2)

Andrei
Andrei

Reputation: 2665

Stata's main dataset, matrices that you access using matrix command and Mata matrices all live separately and need separate functions to deal with, but you can transfer the data between all three.

In your case, you want to load a Mata matrix into the Stata dataset, which you can do as follows:

clear
getmata pvalue1000, double

Please not that your p-values are very small, therefore you need to use double option. Otherwise you'll get zeros with single precision.

Upvotes: 0

Nick Cox
Nick Cox

Reputation: 37208

histogram expects a variable name, and you are first feeding it a matrix name, so no go there, as matrices and variables are utterly different in Stata.

Conversely, when you then feed it a variable name, your variable pval contains only the single and last P-value put in it, as all previous incarnations of pval were cleared out of the way by your own code. (Putting the histogram command inside the loop would have no useful effect here, as at best there is only one P-value inside the variable at a time.)

Matrices can be very useful, but they are at best indirect for this purpose.

Presumably your problem is not your real problem. If you have samples of size 1000 from a uniform on (0, 1), then sample means will all be close to 0.5 and P-values of a test that the mean is 0.2 will all be practically indistinguishable from 0 and no histogram is interesting or useful. But this code seems to capture your intent:

clear 
set obs 1000 
gen A = . 
gen pval = . 

quietly forval i = 1/1000 {
    replace A = runiform()
    ttest A = 0.20
    replace pval = r(p) in `i' 
}

hist pval 

What's not in this code:

  1. Putting results in locals and/or matrices and/or taking them out again is not needed for any purpose. We put them directly into a variable one by one, because that is the result needed.

  2. The observation numbers _n are not used for anything, so they seem dispensable too, although naturally they may be needed for your real problem.

  3. Your comment store the mean is not matched by any code that you try.

Note also that talking about locals as variables is natural for anyone familiar with other programming languages, but in no sense is it Stata terminology. Locals are local macros, not variables.

Upvotes: 1

Related Questions