pepero
pepero

Reputation: 7513

do serialization with c++ fstream

I try to do serialization with fstream. The stream syntax is: "IndexLengthDataIndexLengthData...". e.g., 11c22cc33ccc. When reading the file, the input stream will read "11" as a whole for index.

The index is within [1, INT_MAX]. The length is limited to 516.

Can I do this without using a separator.e.g, "@" or "#", between index and length?

int main() {
  std::ofstream ofs;
  ofs.open("myfile.txt", std::ofstream::out | std::ofstream::trunc);
  for(int i = 1; i <= 10; ++i) {
    ofs << i; // for index
    ofs << i; // for length
    for (int j = 0; j < i; ++j) ofs << 'c';
  }
  ofs.close();
  std::ifstream ifs;
  ifs.open("myfile.txt", std::ifstream::in);
  for (int i = 0; !ifs.eof() && ifs.good(); ++i) {
    int index = 0, length = 0;
    ifs >> index;
    ifs >> length;
    std::cout << "index is " << index << "length is " << length << std::endl;
    // Jump to the next entry
    ifs.seekg(length, std::ios_base::cur);
  }
}

Upvotes: 0

Views: 821

Answers (1)

Ped7g
Ped7g

Reputation: 16596

Yes, if you have fixed size formatting, so 10 chars for index, 3 chars for length, and your example would be encoded as:
" 1 1c 2 2cc 3 3ccc".

Also you talk about fstream, but it looks like you are pursuing a text (human readable) serialization, not binary one. If that is the case, but you don't need truly human readable form, you can mark first byte of length with some bit (numbers in ASCII are encoded as 0x30 to 0x39 value, so you can for example set 0x40 bit without destroying data bytes. Then your example would look like:
1qc2rcc3sccc (q = 0x71 = 0x40|0x31 = 0x40|'1')

For some longer value it would look as: 113q00123456789 ... ARGH I wanted to serialize 10 chars long string "0123456789", and look what happened, I got length 100 instead of 10 (or even worse 100123456789, if you would not limit), so both start and end of length has to be tainted in some way, maybe using bit 0x80 to mark end of length.
1\361c2\362cc3\363ccc (\361 = 0xF1 = 0x40|0x80|0x31 = 0x40|0x80|'1')

Longer value second try:
113q°0123456789 (index 113, length 10, data "0123456789", q = 0x40|'1', ° = 0x80|'0').

Don't you want rather binary form? Would be shorter.


BTW, if you don't mind tainting values, but you want to stay in 7bit ASCII, you can taint not start and end of length, but ends of both index and length, and only with 0x40. So the 11c would become qqc. And 113 10 0123456789 would be 11s1p0123456789.


Binary write/read with platform agnostic endiannes (i.e. file written on little-endian will work on other platform with big-endian).

#include <iostream>
#include <cstdint>
#include <vector>

/**
 * Writes index+length+data in binary form to "out" stream.
 * 
 * Returns number of bytes written to out stream.
 * 
 * Does no data validation (the variable types are only limits for input data).
 * 
 * writeData and readData are done in endiannes agnostic way.
 * So file saved at big-endian platform will be restored correctly on little-endian platform.
 **/
size_t writeData(std::ostream & out,
        const uint32_t index, const uint16_t length, const uint8_t *data) {
    // Write index and length bytes to out stream, resolve endiannes of host platform.
    out.put((char)((index>>0)&0xFF));
    out.put((char)((index>>8)&0xFF));
    out.put((char)((index>>16)&0xFF));
    out.put((char)((index>>24)&0xFF));
    out.put((char)((length>>0)&0xFF));
    out.put((char)((length>>8)&0xFF));
    // If any data, write them to stream
    if (0 < length) out.write(reinterpret_cast<const char *>(data), length);
    return 4 + 2 + length;
}

/**
 * Read data from stream "in" stream into variables index, length and data.
 * 
 * If "in" doesn't contain enough bytes for index+length, zero index/length is returned
 * 
 * If "in" contains more than index+length bytes, but the data are shorter than length,
 * then "repaired" shorter data are returned with shorter "length" (not the read one).
 **/
void readData(std::istream & in,
        uint32_t & index, uint16_t & length, std::vector<uint8_t> & data) {
    // clear current values in index, length, data
    index = length = 0; data.clear();
    // read index+length header from stream
    uint8_t buffer[6];
    in.read(reinterpret_cast<char *>(buffer), 6);
    if (6 != in.gcount()) return;   // header data (index+legth) not found
    // Reassemble read bytes together to index/length numbers in host endiannes.
    index = (buffer[0]<<0) | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24);
    length = (buffer[4]<<0) | (buffer[5]<<8);
    if (0 == length) return;    // zero length, nothing more to read
    // Read the binary data of expected length
    data.resize(length);  // reserve memory for read
    in.read(reinterpret_cast<char *>(data.data()), length);
    if (length != in.gcount()) {    // data read didn't have expected length, damaged file?
        // TODO you may want to handle damaged data in other way, like returning index 0
        // This code will simply accept shorter data, and "repair" length
        length = in.gcount();
        data.resize(length);
    }
}

To see it in action, you may try it on cpp.sh.

Upvotes: 2

Related Questions