Kenenbek Arzymatov
Kenenbek Arzymatov

Reputation: 9109

Write protobuf objects to JSON file

I have such old.JSON file:

[{
    "id": "333333",
    "creation_timestamp": 0,
    "type": "MEDICAL",
    "owner": "MED.com",
    "datafiles": ["stomach.data", "heart.data"]
}]

Then I create an object based on .proto file:

message Dataset {
  string id = 1;
  uint64 creation_timestamp = 2;
  string type = 3;
  string owner = 4;
  repeated string datafiles = 6;
}

Now I want to save this object save back this object to other .JSON file. I did this:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    json.dump(MessageToJson(item), jsfile)

As a result I have:

"{\n  \"id\": \"333333\",\n  \"type\": \"MEDICAL\",\n  \"owner\": \"MED.com\",\n  \"datafiles\": [\n    \"stomach.data\",\n    \"heart.data\"\n  ]\n}"

How to make this file looks like old.JSON file?

Upvotes: 6

Views: 12604

Answers (1)

Kenny Ostrom
Kenny Ostrom

Reputation: 5871

The weird escaping comes from converting the text to json twice, thus forcing the second call to escape the json characters from the first call. Detailed explanation follows:

https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-pysrc

31  """Contains routines for printing protocol messages in JSON format. 
32   
33  Simple usage example: 
34   
35    # Create a proto object and serialize it to a json format string. 
36    message = my_proto_pb2.MyMessage(foo='bar') 
37    json_string = json_format.MessageToJson(message) 
38   
39    # Parse a json format string to proto object. 
40    message = json_format.Parse(json_string, my_proto_pb2.MyMessage()) 
41  """ 

also

 89 -def MessageToJson(message, including_default_value_fields=False): 
...
 99    Returns: 
100      A string containing the JSON formatted protocol buffer message. 

It is pretty clear that this function will return exactly one object of type string. This string contains a lot of json structure, but it's still just a string, as far as python is concerned.

You then pass it to a function which takes a python object (not json), and serializes it to json.

https://docs.python.org/3/library/json.html

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

Okay, how exactly would you encode a string into json? Clearly it can't just use json specific characters, so those would have to be escaped. Maybe there's an online tool, like http://bernhardhaeussner.de/odd/json-escape/ or http://www.freeformatter.com/json-escape.html

You can go there, post the starting json from the top of your question, tell it to generate the proper json, and you get back ... almost exactly what you are getting at the bottom of your question. Cool everything worked correctly!

(I say almost because one of those links adds some newlines on its own, for no apparent reason. If you encode it with the first link, then decode it with the second, it is exact.)

But that's not the answer you wanted, because you didn't want to double-jsonify the data structure. You just wanted to serialize it to json once, and write that to a file:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    actual_json_text = MessageToJson(item)
    jsfile.write( actual_json_text )

Addendum: MessageToJson might need additional parameters to behave as expected
including_default_value_fields=True
preserving_proto_field_name=True
(see comments and links below)

Upvotes: 7

Related Questions