Ben Farmer
Ben Farmer

Reputation: 2854

HDF5 (C interface) consumes all RAM with repeated calls to H5Oget_info_by_name

I am seeing weird behaviour from my HDF5 code in C++ (using the C interface). It maxes out the RAM usage on my system, but then seems to continue running just fine. I am not sure if the point where it stops allocating more RAM is a coincidence or if it is expected internal behaviour of some buffers or some such to do this. Anyway the problem is that if some other application wants to use some RAM then it can't, and the whole system starts thrashing and locks up.

I ran the code through valgrind --tool=massif and massif-visualizer to try and see what is happening, and got the output below:

massif-visualizer_output

Looking through the call chain in a typical snapshot (displayed in the image), it looks like it is occurring in one of my functions op_func, which is called repeatedly by H5Literate as I iterate through a group in the HDF5 file to identify all the datasets that it contains.

But this function isn't even reading or writing any significant data! All it does is call H5Oget_info_by_name repeatedly to query for dataset names! So I don't see why this should be consuming all my RAM. In case I am doing something stupid, here is the code for the function that is repeatedly called:

        inline herr_t op_func (hid_t loc_id, const char *name_in, const H5L_info_t *,
                void *operator_data)
        {
            herr_t          return_val = 0;
            H5O_info_t      infobuf; 
            std::vector<std::string> &od = *static_cast<std::vector<std::string> *> (operator_data);
            std::string name(name_in);

            H5Oget_info_by_name (loc_id, name.c_str(), &infobuf, H5P_DEFAULT);

            switch (infobuf.type)
            {
                case H5O_TYPE_GROUP:
                {
                    break;  
                }
                case H5O_TYPE_DATASET:
                {
                    std::string str(name);
                    if (name.find("_isvalid") == std::string::npos)
                        od.push_back(std::string(name));
                    break;
                }
                case H5O_TYPE_NAMED_DATATYPE:
                    break;
                default:
                    break;
            }

            return return_val;
        }

As you can see it is pretty simple, I am just harvesting names and pushing them onto a vector of strings. It could probably use some better error checking, but it seems to be working just fine aside from this RAM issue.

Am I doing something dumb to cause a memory leak here? Or is HDF5 being REALLY aggressive in its internal buffering, and buffering up way more information than I realise? Perhaps I just need to tell it to clear some buffers or do garbage collection or something?

Upvotes: 1

Views: 396

Answers (0)

Related Questions