Reputation: 21
I'm working on a project in C++ which deals with comma separated data (CSV). What I do is reading the data from a .csv file into a vector of CsvRow objects.
So, today I encountered a really weird std::bad_alloc exceptions being thrown in much more weird situations. Namely, the first test case in which I managed to get a little more time until I get the exception thrown was reading a whole csv file into a vector. The file consists of 500,000 rows and its size is about 70MB. The file was read into memory like a charm, but then after a few seconds into the sorting procedure, the std::bad_alloc gets thrown. It used roughly 67MB of RAM
Note: I'm using boost's flyweights in order to reduce memory consumption.
BUT, this test case was even stranger: I'm reading a 146KB file with a few hundreds of lines, and this time I got the exception while reading the data into a vector, which is totally ridiculous having a 70MB successfully read previously.
I'm suspecting a memory-leak, but my machine has 8GB of RAM, using 64-bit Windows 8. I'm using CodeBlocks, and a MinGW 64-bit boost distro. Any help would be appreciated. Here is a chunk of code in which the std::bad_alloc is being thrown:
Reading data from a csv file
std::ifstream file(file_name_);
int k=0;
for (CsvIterator it(file); it != CsvIterator(); ++it) {
if(columns_ == 0) {
columns_ = (*it).size();
for (unsigned int i=0; i<columns_; i++) {
distinct_values_.push_back(*new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >());
}
}
for (unsigned int i=0; i<columns_; i++) {
distinct_values_[i].insert((*it)[i]);
}
all_rows_[k]=(*it);
k++;
}
Sorting the vector using a internal struct stored in my class
struct SortRowsStruct
{
CsvSorter* r;
SortRowsStruct(CsvSorter* rr) : r(rr) { };
bool operator() (CsvRow a, CsvRow b)
{
for (unsigned int i=0; i<a.size(); i++) {
if(a[r->sorting_order_[i]] != b[r->sorting_order_[i]]) {
int dir = r->sorting_direction_[i];
switch(dir) {
case 0:
return (a[r->sorting_order_[i]] < b[r->sorting_order_[i]]);
break;
case 1:
return !(a[r->sorting_order_[i]] < b[r- >sorting_order_[i]]);
break;
case 2:
return true;
break;
default:
return true;
}
}
}
return true;
}
};
Then, I'm using std::sort()
to sort the vector of CsvRows
SortRowsStruct s(this);
std::sort(all_rows_.begin(), all_rows_.end(), s);
This line looks really suspicious, but I could not figure out an easier way to initialize those hash sets.
distinct_values_.push_back( *new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >() );
Deleting those hash sets in the destructor crashes the program (SIGSEGV) Oh, and another thing to point out is that I can't use the default 32-bit gdb debugger due to my MinGW being 64-bit. The 32bit gdb is bugged and won't work with MinGW 64.
Edit:
Could the boost::flyweight<std::string>
which I use in the CsvRow class cause the problem?
In addition to that, here is a part of the CsvRow
class:
private:
std::vector<boost::flyweights::flyweight<std::string> > row_data_;
And the overloaded []
operator on the CsvRow
class:
std::string const& CsvRow::operator[](std::size_t index) const
{
boost::flyweights::flyweight<std::string> fly = row_data_[index];
return fly.get();
}
Thanks in advance
EDIT - SOLVED:
So, this question solved my problem, although I didn't even think of it.
Every custom comparator we pass to the std::sort()
has to be a strict weak ordering, that is being:
1. Irreflexive
2. Asymmetric
3. Transitive
4. Transitivity of incomparability
More info at :This question and This Wiki article
Actually, I did not follow the first one (irreflexivity), that is, if both of the CsvRow
objects are equal, it should not "compare" them and return true
as if they were okay, but instead return false
.
I solved the whole problem by only changing the default return value when both CsvRow a
and CsvRow b
are equal.
bool operator() (CsvRow a, CsvRow b)
{
for (unsigned int i=0; i<a.size(); i++) {
if(a[r->sorting_order_[i]] != b[r->sorting_order_[i]]) {
...
...
}
}
return false; //this line does not violate the irreflexivity rule
//return true; //but this one does
}
Thanks to everyone who tried to help. Remember this solution in case you experience a similar problem. It's pretty tricky.
Upvotes: 0
Views: 1777
Reputation: 249153
This:
distinct_values_.push_back( *new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >() );
Looks like you are trying to add one default-constructed element to the vector. There's an easier way:
distinct_values_.resize(distinct_values_.size() + 1);
Apart from being easier to type, and more generic, it's also a lot more correct: we should not be new
ing anything here, just creating a single value at the end, and we should let the vector construct it rather than copying it in, which might be wasteful.
And of course we should never try to delete
these values.
Upvotes: 1