Justin
Justin

Reputation: 287

creating a python dictionary like object from protocol buffers for use in pandas

I currently interface to a server that provides protocol buffers. I can potentially receive a very large number of messages. Currently my process to read the protocol buffers and convert them to a Pandas DataFrame (not a necessary step in general, but Pandas offers nice tools for analyzing datasets) is:

  1. Read protocol buffer, it will be a google protobuf object
  2. Convert protocol buffers to dictionary using protobuf_to_dict
  3. use pandas.DataFrame.from_records to get a DataFrame

This works great, but, given the large number of messages I read from the protobuf, it is quite inefficient to convert to dictionary and then to pandas. My question is: is it possible to make a class that can make a python protobuf object look like a dictionary? That is, remove step 2. Any references or pseudocode would be helpful.

Upvotes: 8

Views: 9364

Answers (1)

Zheng Xu
Zheng Xu

Reputation: 66

You might want to check the ProtoText python package. It does provide in-place dict-like operation to access your protobuf object.

Example usage: Assume you have a python protobuf object person_obj.

import ProtoText
print person_obj['name']       # print out the person_obj.name 
person_obj['name'] = 'David'   # set the attribute 'name' to 'David'
# again set the attribute 'name' to 'David' but in batch mode
person_obj.update({'name': 'David'})
print ('name' in person_obj)  # print whether the 'name' attribute is set in person_obj 
# the 'in' operator is better than the google implementation HasField function 
# in the sense that it won't raise Exception even if the field is not defined  

Upvotes: 4

Related Questions