Reputation: 7513
I try to do serialization with fstream. The stream syntax is: "IndexLengthDataIndexLengthData...". e.g., 11c22cc33ccc. When reading the file, the input stream will read "11" as a whole for index.
The index is within [1, INT_MAX]. The length is limited to 516.
Can I do this without using a separator.e.g, "@" or "#", between index and length?
int main() {
std::ofstream ofs;
ofs.open("myfile.txt", std::ofstream::out | std::ofstream::trunc);
for(int i = 1; i <= 10; ++i) {
ofs << i; // for index
ofs << i; // for length
for (int j = 0; j < i; ++j) ofs << 'c';
}
ofs.close();
std::ifstream ifs;
ifs.open("myfile.txt", std::ifstream::in);
for (int i = 0; !ifs.eof() && ifs.good(); ++i) {
int index = 0, length = 0;
ifs >> index;
ifs >> length;
std::cout << "index is " << index << "length is " << length << std::endl;
// Jump to the next entry
ifs.seekg(length, std::ios_base::cur);
}
}
Upvotes: 0
Views: 821
Reputation: 16596
Yes, if you have fixed size formatting, so 10 chars for index, 3 chars for length, and your example would be encoded as:
" 1 1c 2 2cc 3 3ccc"
.
Also you talk about fstream
, but it looks like you are pursuing a text (human readable) serialization, not binary one. If that is the case, but you don't need truly human readable form, you can mark first byte of length with some bit (numbers in ASCII are encoded as 0x30
to 0x39
value, so you can for example set 0x40
bit without destroying data bytes. Then your example would look like:
1qc2rcc3sccc
(q
= 0x71
= 0x40|0x31
= 0x40|'1'
)
For some longer value it would look as: 113q00123456789
... ARGH I wanted to serialize 10 chars long string "0123456789", and look what happened, I got length 100
instead of 10
(or even worse 100123456789, if you would not limit), so both start and end of length has to be tainted in some way, maybe using bit 0x80
to mark end of length.
1\361c2\362cc3\363ccc
(\361
= 0xF1
= 0x40|0x80|0x31
= 0x40|0x80|'1'
)
Longer value second try:
113q°0123456789
(index 113, length 10, data "0123456789", q
= 0x40|'1'
, °
= 0x80|'0'
).
Don't you want rather binary form? Would be shorter.
BTW, if you don't mind tainting values, but you want to stay in 7bit ASCII, you can taint not start and end of length, but ends of both index and length, and only with 0x40
. So the 11c
would become qqc
. And 113
10
0123456789
would be 11s1p0123456789
.
Binary write/read with platform agnostic endiannes (i.e. file written on little-endian will work on other platform with big-endian).
#include <iostream>
#include <cstdint>
#include <vector>
/**
* Writes index+length+data in binary form to "out" stream.
*
* Returns number of bytes written to out stream.
*
* Does no data validation (the variable types are only limits for input data).
*
* writeData and readData are done in endiannes agnostic way.
* So file saved at big-endian platform will be restored correctly on little-endian platform.
**/
size_t writeData(std::ostream & out,
const uint32_t index, const uint16_t length, const uint8_t *data) {
// Write index and length bytes to out stream, resolve endiannes of host platform.
out.put((char)((index>>0)&0xFF));
out.put((char)((index>>8)&0xFF));
out.put((char)((index>>16)&0xFF));
out.put((char)((index>>24)&0xFF));
out.put((char)((length>>0)&0xFF));
out.put((char)((length>>8)&0xFF));
// If any data, write them to stream
if (0 < length) out.write(reinterpret_cast<const char *>(data), length);
return 4 + 2 + length;
}
/**
* Read data from stream "in" stream into variables index, length and data.
*
* If "in" doesn't contain enough bytes for index+length, zero index/length is returned
*
* If "in" contains more than index+length bytes, but the data are shorter than length,
* then "repaired" shorter data are returned with shorter "length" (not the read one).
**/
void readData(std::istream & in,
uint32_t & index, uint16_t & length, std::vector<uint8_t> & data) {
// clear current values in index, length, data
index = length = 0; data.clear();
// read index+length header from stream
uint8_t buffer[6];
in.read(reinterpret_cast<char *>(buffer), 6);
if (6 != in.gcount()) return; // header data (index+legth) not found
// Reassemble read bytes together to index/length numbers in host endiannes.
index = (buffer[0]<<0) | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24);
length = (buffer[4]<<0) | (buffer[5]<<8);
if (0 == length) return; // zero length, nothing more to read
// Read the binary data of expected length
data.resize(length); // reserve memory for read
in.read(reinterpret_cast<char *>(data.data()), length);
if (length != in.gcount()) { // data read didn't have expected length, damaged file?
// TODO you may want to handle damaged data in other way, like returning index 0
// This code will simply accept shorter data, and "repair" length
length = in.gcount();
data.resize(length);
}
}
To see it in action, you may try it on cpp.sh.
Upvotes: 2