gowithefloww
gowithefloww

Reputation: 2251

Fail to retrieve JSON entity from Google Datastore using BigQuery

I'm trying to export an entity from Google Data Store to Big Query (and then to CSV).

When I create the dataset, everything turns fine except for one missing variable that is supposed to be a JSON.(ndb.JsonProperty())

Looking at this entity variable in the datastore, it seems to be an encoded JSON (eg : ...0NzIyMDUyODkiLCAidXNlcl9uYW1lIjogIlZpbmNlbnQgR...)

My only purpose is to export this entity from the datastore using Big Query, Python or whatever needed, in order to explore the data.

Upvotes: 1

Views: 333

Answers (1)

snakecharmerb
snakecharmerb

Reputation: 55933

ndb JsonProperty values are stored in the datastore as blobs:

JsonProperty Value is a Python object (such as a list or a dict or a string) that is serializable using Python's json module; Cloud Datastore stores the JSON serialization as a blob.

BigQuery discards blob data:

Blob BigQuery discards these values when loading the data.

One possible workaround is to create Computed Properties on your model to extract the data that you're interested in in a format that BigQuery will accept.

For example, say you are storing a dict like this in your JsonProperty:

data = {'foo': 'bar', 'baz': 'quux'}

Let's say you are interested in the value corresponding to the key foo. You can create a ComputedProperty that returns the value, and this will be picked up by your BigQuery export (note you must save all your model instances after a ComputedProperty has been added to populate the new property).

class MyModel(ndb.Model):

   blob = ndb.JsonProperty()
   foo = ndb.ComputedProperty(lambda self: self.blob.get('bar'))

obj = MyModel(blob=data)
obj.put()
obj.foo
'bar'

Upvotes: 3

Related Questions