Reputation: 16825
In my schema, as described in the below test data generation example, I want to know a good way to:
Dereference all instances of Favourites that have reference keys to instances of Pictures that have been deleted. Just delete any Favourite that links to a deleted picture.
The Person
class is a user
The Picture
class is something that can be a Favourite
The Favourite
class is an example of the Link-Model way of having many-to-many relationships.
Why this question? First I hope it doesn't fall out of the scope here, second because this can happen and third because it's interesting.
How? Let's say that a person can have up to thousands favourites, something like Likes are on social networks or to make it worse, orders, accounts or invalid data in a scientific application. In our example for some reason (and these reasons happen) a person is experiencing lot of dead favourite link, or I do know, that there are dead favourites.
What would be a good way to do this, reducing ndb.get()
operations and not iterating through every Favourite.
Lets not complicate things. Lets make the assumption that we have only one user suffering from dead favourites. He has a class of Person and stubbed user_id property of '123'.
In the following example you can use the following handlers and their corresponding functions.
import time
import sys
import logging
import random
import cgi
import webapp2
from google.appengine.ext import ndb
class Person(ndb.Expando):
pass
class Picture(ndb.Expando):
pass
class Favourite(ndb.Expando):
user_id = ndb.StringProperty(required=True)
#picture = ndb.KeyProperty(kind=Picture, required=True)
pass
class GenerateDataHandler(webapp2.RequestHandler):
def get(self):
try:
number_of_models = abs(int(cgi.escape(self.request.get('n'))))
except:
number_of_models = 10
logging.info("GET ?n=parameter not defined. Using default.")
pass
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
if not person:
person = Person()
person.user_id = user_id #Stub
person.put()
logging.info("Created Person instance")
if not self._gen_data(person, number_of_models):
return
self.response.write("Data generated successfully")
def _gen_data(self, person, number_of_models):
first, last = Picture.allocate_ids(number_of_models)
picture_keys = [ndb.Key(Picture, id) for id in range(first, last+1)]
pictures = []
favourites = []
for picture_key in picture_keys:
picture = Picture(key=picture_key)
pictures.append(picture)
favourite = Favourite(parent=person.key,
user_id=person.user_id,
picture=picture_key
)
favourites.append(favourite)
entities = favourites
entities[1:1] = pictures
ndb.put_multi(entities)
return True
class CorruptDataHandler(webapp2.RequestHandler):
def get(self):
if not self._corrupt_data(0.5):#50% corruption
return
self.response.write("Data corruption completed successfully")
def _corrupt_data(self, n):
picture_keys = Picture.query().fetch(99999, keys_only=True)
random_picture_keys = random.sample(picture_keys, int(float(len(picture_keys))*n))
ndb.delete_multi(random_picture_keys)
return True
class FixDataHandler(webapp2.RequestHandler):
def get(self):
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
self._dereference(person)
def _dereference(self, person):
#Here if where you implement your answer
Separate handlers due to eventual consistency in the NDB Datastore. More info: GAE put_multi() entities using backend NDB
Of course I am posting an answer as well to show that I tried something before posting this.
Upvotes: 0
Views: 377
Reputation: 15143
A ReferenceProperty is just a key, so if you have the key of the deleted Person, you can use that to query the Favourite.
Otherwise, there's no easy way. You'll have to filter through all Favourites and find ones that have an invalid Picture. It's very simple in a mapreduce job, but could be an expensive query if you have a lot of Favourites.
Upvotes: 1
Reputation: 1133
You could use a pre delete hook (look here for a way to implement it) Of course this could be done easier if you use the NDB API instead of the Datastore API (hooks on NDB), but then you'll have to change the way you make the referenes
Upvotes: 1