ChrisSampson87
ChrisSampson87

Reputation: 101

Reshaping and merging simulations in Stata

I have a dataset, which consists of 1000 simulations. The output of each simulation is saved as a row of data. There are variables alpha, beta and simulationid.

Here's a sample dataset:

simulationid    beta          alpha
1               0.025840106   20.59671241
2               0.019850549   18.72183088
3               0.022440886   21.02298228
4               0.018124857   20.38965861
5               0.024134726   22.08678021
6               0.023619479   20.67689981
7               0.016907209   17.69609466
8               0.020036455   24.6443037
9               0.017203175   24.32682682
10              0.020273349   19.1513272

I want to estimate a new value - let's call it new - which depends on alpha and beta as well as different levels of two other variables which we'll call risk and price. Values of risk range from 0 to 100, price from 0 to 500 in steps of 5.

What I want to achieve is a dataset that consists of values representing the probability that (across the simulations) new is greater than 0 for combinations of risk and price.

I can achieve this using the code below. However, the reshape process takes more hours than I'd like. And it seems to me to be something that could be completed a lot quicker.

So, my question is either:

i) is there an efficient way to generate multiple datasets from a single row of data without multiple reshape, or

ii) am I going about this in totally the wrong way?

set maxvar 15000

/* Input sample data */
input     simulationid  beta          alpha
1               0.025840106   20.59671241
2               0.019850549   18.72183088
3               0.022440886   21.02298228
4               0.018124857   20.38965861
5               0.024134726   22.08678021
6               0.023619479   20.67689981
7               0.016907209   17.69609466
8               0.020036455   24.6443037
9               0.017203175   24.32682682
10              0.020273349   19.1513272
end


forvalues risk = 0(1)100 {
forvalues price = 0(5)500 {
    gen new_r`risk'_p`price' = `price' * (`risk'/200)* beta - alpha
        gen probnew_r`risk'_p`price' = 0
        replace probnew_r`risk'_p`price' = 1 if new_r`risk'_p`price' > 0
        sum probnew_r`risk'_p`price', mean
        gen mnew_r`risk'_p`price' = r(mean)
    drop new_r`risk'_p`price' probnew_r`risk'_p`price'
}
}
drop if simulationid > 1
save simresults.dta, replace

forvalues risk = 0(1)100 {
    clear
    use simresults.dta
    reshape long mnew_r`risk'_p, i(simulationid) j(price)
    keep simulation price mnew_r`risk'_p
    rename mnew_r`risk'_p risk`risk'
    save risk`risk'.dta, replace
}

clear
use risk0.dta
forvalues risk = 1(1)100 {
    merge m:m price using risk`risk'.dta, nogen
    save merged.dta, replace
}

Upvotes: 0

Views: 109

Answers (1)

Nick Cox
Nick Cox

Reputation: 37278

Here's a start on your problem.

So far as I can see, you don't need more than one dataset.

The various reshapes and merges just rearrange what was first generated and that can be done within one dataset.

The code here in the first instance is for just one pair of values of alpha and beta. To simulate 1000 such, you would need 1000 times more observations, i.e. about 10 million, which is not usually a problem and to loop over the alphas and betas. But the loop can be tacit. We'll get to that.

This code has been run and is legal. It's limited to one alpha, beta pair.

clear 
input     simulationid  beta          alpha
1               0.025840106   20.59671241
2               0.019850549   18.72183088
3               0.022440886   21.02298228
4               0.018124857   20.38965861
5               0.024134726   22.08678021
6               0.023619479   20.67689981
7               0.016907209   17.69609466
8               0.020036455   24.6443037
9               0.017203175   24.32682682
10              0.020273349   19.1513272
end

local N = 101 * 101 
set obs `N' 

egen risk = seq(), block(101) 
replace risk = risk - 1 
egen price = seq(), from(0) to(100)
replace price = 5 * price 

gen result = (price * (risk/200)* beta[1] - alpha[1]) > 0 
bysort price risk: gen mean = sum(result) 
by price risk: replace mean = mean[_N]/_N 

Assuming now that you first read in 1000 values, here is a sketch of how to get the whole thing. This code has not been tested. That is, your dataset starts with 1000 observations; you then enlarge it to 10 million or so, and get your results. The tricksy part is using an expression for the subscript to ensure that each block of results is for a distinct alpha, beta pair. That's not compulsory; you could do it in a loop, but then you would need to generate outside the loop and replace within it.

local N = 101 * 101 * 1000 
set obs `N' 

egen risk = seq(), block(101) 
replace risk = risk - 1 
egen price = seq(), from(0) to(100)
replace price = 5 * price 
egen sim = seq(), block(10201) 

gen result = (price * (risk/200)* beta[ceil(_n/10201)] - alpha[ceil(_n/10201)]) > 0 
bysort sim price risk: gen mean = sum(result) 
by sim price risk: replace mean = mean[_N]/_N 

Other devices used: egen to set up in blocks; getting the mean without repeated calls to summarize; using a true-or-false expression directly.

NB: I haven't tried to understand what you are doing, but it seems to me that the price-risk-simulation conditions define single values, so calculating a mean looks redundant. But perhaps that is in the code because you wish to add further detail to the code once you have it working.

NB2: This seems a purely deterministic calculation. Not sure that you need this code at all.

Upvotes: 0

Related Questions