Reputation: 49
I want to write an instance of a class that includes different data types into hard disk and read it whenever I need. I used the below code to do this. The problem is that whenever I save the object into a file, it creates a file on the folder but it is just size of 1 KB. Also when I open the file from the same function that saves the file, I can read variables in the class, but when I move the read section to another function and open the file from there, variables cannot be read. How can I fix the problem? Thanks in advance.
Write to a file:
stream.open("configuration/KMeansModel.bin", std::ios::out | std::ios::binary);
stream.write((char *)& kmeans, sizeof(kmeans));
stream.close();
Read from the file:
KMeans::KMeans kmeans_(umapFeatureLabel_);
stream_.open("configuration/KMeansModel.bin", std::ios::in, std::ios::binary);
stream_.read((char *)& kmeans_, sizeof(kmeans_));
stream_.close();
Class definition:
class KMeans
{
private:
int m_K;
int m_iters;
int m_dimensions;
int m_total_features;
std::vector<Cluster> m_clusters;
std::unordered_map<std::string, std::string> m_umapFeatureLabel;
std::unordered_map<int, std::vector<std::vector<long double>>> m_umapClusterFeatureList;
int getNearestClusterId(Feature feature);
public:
KMeans::KMeans::KMeans();
KMeans(std::unordered_map<std::string, std::string>& umapFeatureLabel);
void run(std::vector<Feature>& allFeatures);
void predict(Feature feature);
void updateKMeans(std::vector<Feature>& allNewFeaturesRead);
std::string getLabelOfFeature(std::string feature);
};
Upvotes: 0
Views: 1294
Reputation: 4733
Your file saving code uses function sizeof
. Your data structure includes vector and map objects.
For example, as far as sizeof
is concerned, a std::vector
object takes 16 bytes, absolutely regardless of the number of elements. That's 8 bytes for the element count, plus 8 bytes for the pointer to the actual elements, assuming a 64 bits machine.
Say your vector has 100 elements, 8 bytes per element, and the elements are stored starting at memory address 424000. The write
method will dutifully store into the file a) the number 100 and b) the number 424000; but it will make absolutely no attempt to save into the file memory locations from 424000 to 424800. For it has no way to know that 424000 is a pointer; that's just a number.
Hence, the file does not contain the information that is necessary to restore the vector state.
As mentioned in the comments above, the subject of saving complex pointer-based data structures into simple byte arrays for the purpose of file storage or network transmission is known as serialization or marshalling/unmarshalling.
It is a non obvious subject of its own, in the same way as sorting algorithms or matrix multiplication are non obvious subjects. It would probably take you a lot of time to come up with a properly debugged solution of your own, a solution that takes care of maintaining consistency between saving and restoring code, etc ...
Serialization is a non-obvious subject, but it is also an old, well-known subject. So instead of painfully coming up with your own solution, you can rely on existing, publicly available code.
In similar fashion, the only situations where you would have to come up with your own matrix multiplication code is when:
Other than these, you would probably rely on say existing LAPACK code.
Regarding serialization, as hinted to by Botje in the comments above, the Boost web site provides a C++ serialization library, along with a suitable tutorial.
I am providing below a small code sample using the Boost library. A simple guinea pig object contains an integer value, a string and a map. Of course, I am shamelessly borrowing from the Boost tutorial.
We need to include a couple of header files:
#include <map>
#include <fstream>
#include <iostream>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/utility.hpp>
#include <boost/serialization/map.hpp>
The object class, which pretends to store some token geographical info:
class CapitalMap
{
public:
CapitalMap(const std::string& myName, int myVersion) :
_name(myName), _version(myVersion)
{};
CapitalMap() = default; // seems required by serialization
inline void add(const std::string& country, const std::string& city)
{ _cmap[country] = city; }
void fdump(std::ostream& fh);
private:
std::string _name;
int _version;
std::map<std::string, std::string> _cmap;
friend class boost::serialization::access; // ALLOW FOR FILE ARCHIVAL
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
ar & _name;
ar & _version; // mind the name conflict with plain "version" argument
ar & _cmap;
}
};
A small debugging utility function:
void CapitalMap::fdump(std::ostream& ofh) // text dumping utility for debug
{
ofh << "CapitalMap name = \"" << _name << "\" version = " <<
_version << '\n';
for (const auto& pair : _cmap) {
auto country = pair.first; auto city = pair.second;
ofh << city << " is the capital of " << country << '\n';
}
}
Code to create the object, save it on disk, and (implicitely) deallocate it:
void buildAndSaveCapitalMap (const std::string& archiveName,
const std::string& mapName,
int version)
{
CapitalMap euroCapitals(mapName, version);
euroCapitals.add("Germany", "Berlin");
euroCapitals.add("France", "Paris");
euroCapitals.add("Spain", "Madrid");
euroCapitals.fdump(std::cout); // just for checking purposes
// save data to archive file:
std::ofstream ofs(archiveName);
boost::archive::text_oarchive oa(ofs);
oa << euroCapitals;
// ofstream connexion closed automatically here
// archive object deleted here - because going out of scope
// CapitalMap object deleted here - because going out of scope
}
Small main program to create the file and then restore the object state from that file:
int main(int argc, char* argv[])
{
const std::string archiveName{"capitals.dat"};
std::cout << std::endl;
buildAndSaveCapitalMap(archiveName, "EuroCapitals", 42);
// go restore our CapitalMap object to its original state:
CapitalMap cm; // object created in its default state
std::ifstream ifs(archiveName);
boost::archive::text_iarchive inAr(ifs);
inAr >> cm; // read back object ...
std::cout << std::endl;
cm.fdump(std::cout); // check that it's actually back and in good shape ...
std::cout << std::endl;
return 0;
}
The problem of maintaining consistency between saving and restoring code is brilliantly solved by altering the meaning of operator “&” according to the direction of travel.
Minor problems along the way:
$ g++ serialw00.cpp -lboost_serialization -o ./serialw00.x
$ ./serialw00.x
CapitalMap name = "EuroCapitals" version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain
CapitalMap name = "EuroCapitals" version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain
$
More details here: SO_q_523872
Upvotes: 3