Reputation: 23
I'm writing a tool for my master thesis, that needs to read protobuf datastreams from a file. Until now I worked exclusively on Mac OS and everything was fine, but now I'm trying to run the tool on Windows too.
Sadly on Windows I am not able to read multiple consecutive messages from a single stream. I tried to narrow the problem down and came to following small program that reproduces the problem.
#include "tokens.pb.h"
#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <fstream>
int main(int argc, char* argv[])
{
std::fstream tokenFile(argv[1], std::ios_base::in);
if(!tokenFile.is_open())
return -1;
google::protobuf::io::IstreamInputStream iis(&tokenFile);
google::protobuf::io::CodedInputStream cis(&iis);
while(true){
google::protobuf::io::CodedInputStream::Limit l;
unsigned int msgSize;
if(!cis.ReadVarint32(&msgSize))
return 0; // probably reached eof
l = cis.PushLimit(msgSize);
tokenio::Union msg;
if(!msg.ParseFromCodedStream(&cis))
return -2; // couldn't read msg
if(cis.BytesUntilLimit() > 0)
return -3; // msg was not read completely
cis.PopLimit(l);
if(!msg.has_string() &&
!msg.has_file() &&
!msg.has_token() &&
!msg.has_type())
return -4; // msg contains no data
}
return 0;
}
On Mac OS this runs fine and returns 0 after reading the whole file as I expected.
On Windows the first message is read without problems. For the second messageParseFromCodedInputStream
still returns true but does not read any data. This results in a BytesUntilLimit
value that is larger than 0 and a return value of -3. Of course the message also does not contain any useable data. Any further reads from cis
will also fail, as if the end of the stream was reached, even though the file has not been read completely.
I also tried using a FileInputStream
with a file descriptor for input with the same result. Removing Push/PopLimit
and reading data using ReadString
calls with explicit message sizes and then parsing from that string also didn't help.
The following protobuf file was used.
package tokenio;
message TokenType {
required uint32 id = 1;
required string name = 2;
}
message StringInstance {
required string value = 1;
optional uint64 id = 2;
}
message BeginOfFile {
required uint64 name = 1;
optional uint64 type = 2;
}
message Token {
required uint32 type = 1;
required uint32 offset = 2;
optional uint32 line = 3;
optional uint32 column = 4;
optional uint64 value = 5;
}
message Union {
optional TokenType type = 1;
optional StringInstance string = 2;
optional BeginOfFile file = 3;
optional Token token = 4;
}
And this is a sample input file.
The input file seems to be ok. At least its readable by the protobuf editor (on Windows and Mac OS) as well as the c++ implementation on Mac OS.
The code was tested:
What am I doing wrong?
Upvotes: 2
Views: 2341
Reputation: 52621
Make it std::fstream tokenFile(argv[1], std::ios_base::in | std::ios_base::binary);
. The default is text mode; on Mac and other Unix-like systems it doesn't matter, but on Windows in text mode you get CRLF sequences translated to LF, and ^Z (aka '\x1A') character treated as end-of-file indicator. Those characters might, by coincidence, occur in a binary stream, and cause trouble.
Upvotes: 2