manugutito
manugutito

Reputation: 13

Issue with big array size c++

I have a task for a master course that involves the sampling of probability distributions via Monte Carlo methods. I want to run some tests before, so I'm sampling the chi square distribution with 2 dof (it can be done with the inverse method). Anyway, I am running into problems with array sizes, anything over ~30000 elements will result with all the points of the probability distribution accumulating at 1, or even with a program crash.

At the end of the post you can see a screenshot of the result with 10000 points and the code I wrote. I have used many versions of the g++ compiler, up to 4.9.2, and none of them worked. The OS is Windows 7. Aaaand the same code works flawlessly in a gentoo computer from a friend. Any suggestions?

Thanks in advance!

Manuel J.

Sampling with 10000 points

And here is the code I'm using:

I've removed it to keep the post shorter. Please see edited code

Edit: I've made several changes to the code, the main of which is the transition from plain arrays to vectors, and the problem is still there: anything over ~10000 elements will not work. The problem is definitely the histograma function: the contents of the histo.dat file when you give it a too large vector are:

inf 100000
inf 0
inf 0
inf 0
...
inf 0

The code as it is now is as follows. The problem is definitely the max and min functions, but I've not the slightest idea about where the problem is.

#include <iostream>
#include <string>
#include <cmath>
#include <cstdlib>
#include <ctime>
#include <fstream>
#include <vector>
#define PI 3.14159265359
#define NHISTOMAX 100
#define N 100000
#define GNUPLOT_PATH "D:\\gnuplot\\bin\\gnuplot.exe -persist"
using namespace std;
double max (vector<double> v);
double min (vector<double> v);
void histoplot (string name1);
void histograma (string name, vector<double> v, size_t num2);

vector<double> v(N);

int main (void)
{   
    srand(time(NULL));

    for(size_t i=0;i<N;i++)
    v[i]=-2*log(1.0*rand()/RAND_MAX);

    histograma("hist.dat",v,NHISTOMAX);

    histoplot("hist.dat");

    system("pause");
    return 0;
}

void histograma (string name, vector<double> v, size_t num2)
{
    ofstream fileout;
    double max1, min1, delta;
    size_t i, j, num=v.size();
    vector<int> histo(NHISTOMAX);

    if(num2>NHISTOMAX) cout << "Too many intervals. " << endl;

    else
    {
        for(i=0;i<num2;i++)
            histo[i]=0;

        max1=max(v);
        min1=min(v);
        delta=(max1-min1)/num2;

        for(i=0;i<num;i++)
        {
            j=(size_t)((v[i]-min1)/delta); 
            if(j==NHISTOMAX) j--;
            histo[j]++;
        }

        fileout.open(name.c_str());
        for(i=0;i<num2;i++)
            fileout << min1+(i+0.5)*delta << "\t" << histo[i] << endl;
        fileout.close();

        cout << "Histogram generated! Output file: " << name << endl << endl;
    }

    return;
}

void histoplot (string name1)
{
    FILE *gp1;
    gp1=popen(GNUPLOT_PATH,"w");

    if(gp1==NULL)
        cout << "Unable to open pipe. Check gnuplot.exe's path" << endl;

    else
    {
        fprintf(gp1,"unset key\n");
        fprintf(gp1,"set term wxt\n");
        fprintf(gp1,"set output\n");
        fprintf(gp1,"plot '");
        fprintf(gp1,name1.c_str());
        fprintf(gp1,"' w histeps\n");
        fflush(gp1);
        pclose(gp1);
    }

    return;
}

double max (vector<double> v)
{
    double aux=v[0];
    size_t i, num=v.size();
    for(i=1;i<num;i++)
        if(aux<v[i])
            aux=v[i];
    return aux;
}

double min (vector<double> v)
{
    double aux=v[0];
    size_t i, num=v.size();
    for(i=1;i<num;i++)
        if(aux>v[i])
            aux=v[i];
    return aux;
}

Edit 2: typical plot of the values in the v vector. All of them positive and under 25.

Plot of a typical vector obtained.

Upvotes: 0

Views: 513

Answers (2)

PaulMcKenzie
PaulMcKenzie

Reputation: 35455

Use std::vector:

#include <vector>
#define N 10000
int main (void)
{
    std::vector<double> v(N); 
    //...
    histograma("hist.dat",v.data(), N, NHISTOMAX);  
    // or
    // histograma("hist.dat", &v[0], N, NHISTOMAX);  

Note that no change to the histograma function itself is required. The only difference in the call is that v.data() (or &v[0]) is used to return a pointer to the start of the internal buffer that stores the array of double.

In addition, instead of writing min and max functions, use std::min_element and std::max_element found in the <algorithm> header:

    max1 = *std::max_element(v.begin(), v.end());
    min1  = *std::min_element(v.begin(), v.end());

or if using C++ 11, std::minmax_element to get both:

    auto pr = std::minmax_element(v.begin(), v.end());
    min1 = *(pr.first);
    max1 = *(pr.second);

Also, if as the other answer suggested, and that you have a memory overwrite, I suggest you change from using operator [ ] to using vector::at() to access your elements. Using at() will throw an out_of_range exception as soon as you make an out-of-bounds access. Then you fix those errors before converting back to using [ ] for the vectors.

For example, this code:

        j=(size_t)((v[i]-min1)/delta); 
        if(j==NHISTOMAX) j--;
        histo[j]++;

What if j is some huge number, far exceeding the bounds of the histo vector? Decrementing j by 1 isn't going to help. To see if this is an issue, at() can be used here:

        j=(size_t)((v[i]-min1)/delta); 
        if(j==NHISTOMAX) j--;
        histo.at(j)++;

If you run the program, if j is out of bounds, an exception is thrown as soon as you attempt to increment histo[j]. You can then inspect the issue and fix the error(s).

Upvotes: 1

uesp
uesp

Reputation: 6204

Looks like you have a potential array overflow at:

    delta = (max1-min1)/num2;
   ...
        j = (int)((v[i]-min1)/delta);
        histo[j]++;

when v[i] == max1 this makes j == num2 == NHISTOMAX but histo[NHISTOMAX] is out of bounds (last valid element is NHISTOMAX-1 since array indexes start at 0).

Since you tagged your question as C++ I would use std::vector<double> instead of raw arrays:

#include <vector>
...
std::vector<double> v(N);
...
void histograma (string name, const std::vector<double> v, const int maxHistoSize)
{
     std::vector<double> histo(maxHistoSize + 1);
     ...
     delta = (max1 - min1) / maxHistoSize;
...    

      // There's likely a more C++ idiomatic way to get the min/max of a 
      // std::vector<> but this works fine with a few changes.
double max (const std::vector<double> v)
{
    double minValue = v[0];

    for(int i = 1; i< v.size(); i++) 
    {
         if (minValue < v[i]) minValue = v[i];
    }

    return aux;
}

// Same for min()

Upvotes: 0

Related Questions