Reputation: 13
I have a task for a master course that involves the sampling of probability distributions via Monte Carlo methods. I want to run some tests before, so I'm sampling the chi square distribution with 2 dof (it can be done with the inverse method). Anyway, I am running into problems with array sizes, anything over ~30000 elements will result with all the points of the probability distribution accumulating at 1, or even with a program crash.
At the end of the post you can see a screenshot of the result with 10000 points and the code I wrote. I have used many versions of the g++ compiler, up to 4.9.2, and none of them worked. The OS is Windows 7. Aaaand the same code works flawlessly in a gentoo computer from a friend. Any suggestions?
Thanks in advance!
Manuel J.
And here is the code I'm using:
I've removed it to keep the post shorter. Please see edited code
Edit: I've made several changes to the code, the main of which is the transition from plain arrays to vectors, and the problem is still there: anything over ~10000 elements will not work. The problem is definitely the histograma
function: the contents of the histo.dat
file when you give it a too large vector are:
inf 100000
inf 0
inf 0
inf 0
...
inf 0
The code as it is now is as follows. The problem is definitely the max and min functions, but I've not the slightest idea about where the problem is.
#include <iostream>
#include <string>
#include <cmath>
#include <cstdlib>
#include <ctime>
#include <fstream>
#include <vector>
#define PI 3.14159265359
#define NHISTOMAX 100
#define N 100000
#define GNUPLOT_PATH "D:\\gnuplot\\bin\\gnuplot.exe -persist"
using namespace std;
double max (vector<double> v);
double min (vector<double> v);
void histoplot (string name1);
void histograma (string name, vector<double> v, size_t num2);
vector<double> v(N);
int main (void)
{
srand(time(NULL));
for(size_t i=0;i<N;i++)
v[i]=-2*log(1.0*rand()/RAND_MAX);
histograma("hist.dat",v,NHISTOMAX);
histoplot("hist.dat");
system("pause");
return 0;
}
void histograma (string name, vector<double> v, size_t num2)
{
ofstream fileout;
double max1, min1, delta;
size_t i, j, num=v.size();
vector<int> histo(NHISTOMAX);
if(num2>NHISTOMAX) cout << "Too many intervals. " << endl;
else
{
for(i=0;i<num2;i++)
histo[i]=0;
max1=max(v);
min1=min(v);
delta=(max1-min1)/num2;
for(i=0;i<num;i++)
{
j=(size_t)((v[i]-min1)/delta);
if(j==NHISTOMAX) j--;
histo[j]++;
}
fileout.open(name.c_str());
for(i=0;i<num2;i++)
fileout << min1+(i+0.5)*delta << "\t" << histo[i] << endl;
fileout.close();
cout << "Histogram generated! Output file: " << name << endl << endl;
}
return;
}
void histoplot (string name1)
{
FILE *gp1;
gp1=popen(GNUPLOT_PATH,"w");
if(gp1==NULL)
cout << "Unable to open pipe. Check gnuplot.exe's path" << endl;
else
{
fprintf(gp1,"unset key\n");
fprintf(gp1,"set term wxt\n");
fprintf(gp1,"set output\n");
fprintf(gp1,"plot '");
fprintf(gp1,name1.c_str());
fprintf(gp1,"' w histeps\n");
fflush(gp1);
pclose(gp1);
}
return;
}
double max (vector<double> v)
{
double aux=v[0];
size_t i, num=v.size();
for(i=1;i<num;i++)
if(aux<v[i])
aux=v[i];
return aux;
}
double min (vector<double> v)
{
double aux=v[0];
size_t i, num=v.size();
for(i=1;i<num;i++)
if(aux>v[i])
aux=v[i];
return aux;
}
Edit 2: typical plot of the values in the v vector. All of them positive and under 25.
Plot of a typical vector obtained.
Upvotes: 0
Views: 513
Reputation: 35455
Use std::vector:
#include <vector>
#define N 10000
int main (void)
{
std::vector<double> v(N);
//...
histograma("hist.dat",v.data(), N, NHISTOMAX);
// or
// histograma("hist.dat", &v[0], N, NHISTOMAX);
Note that no change to the histograma
function itself is required. The only difference in the call is that v.data()
(or &v[0]
) is used to return a pointer to the start of the internal buffer that stores the array of double
.
In addition, instead of writing min
and max
functions, use std::min_element
and std::max_element
found in the <algorithm>
header:
max1 = *std::max_element(v.begin(), v.end());
min1 = *std::min_element(v.begin(), v.end());
or if using C++ 11, std::minmax_element to get both:
auto pr = std::minmax_element(v.begin(), v.end());
min1 = *(pr.first);
max1 = *(pr.second);
Also, if as the other answer suggested, and that you have a memory overwrite, I suggest you change from using operator [ ]
to using vector::at()
to access your elements. Using at()
will throw an out_of_range
exception as soon as you make an out-of-bounds access. Then you fix those errors before converting back to using [ ]
for the vectors.
For example, this code:
j=(size_t)((v[i]-min1)/delta);
if(j==NHISTOMAX) j--;
histo[j]++;
What if j
is some huge number, far exceeding the bounds of the histo
vector? Decrementing j
by 1 isn't going to help. To see if this is an issue, at()
can be used here:
j=(size_t)((v[i]-min1)/delta);
if(j==NHISTOMAX) j--;
histo.at(j)++;
If you run the program, if j
is out of bounds, an exception is thrown as soon as you attempt to increment histo[j]
. You can then inspect the issue and fix the error(s).
Upvotes: 1
Reputation: 6204
Looks like you have a potential array overflow at:
delta = (max1-min1)/num2;
...
j = (int)((v[i]-min1)/delta);
histo[j]++;
when v[i] == max1
this makes j == num2 == NHISTOMAX
but histo[NHISTOMAX]
is out of bounds (last valid element is NHISTOMAX-1
since array indexes start at 0).
Since you tagged your question as C++ I would use std::vector<double>
instead of raw arrays:
#include <vector>
...
std::vector<double> v(N);
...
void histograma (string name, const std::vector<double> v, const int maxHistoSize)
{
std::vector<double> histo(maxHistoSize + 1);
...
delta = (max1 - min1) / maxHistoSize;
...
// There's likely a more C++ idiomatic way to get the min/max of a
// std::vector<> but this works fine with a few changes.
double max (const std::vector<double> v)
{
double minValue = v[0];
for(int i = 1; i< v.size(); i++)
{
if (minValue < v[i]) minValue = v[i];
}
return aux;
}
// Same for min()
Upvotes: 0