Reputation: 6896
I planning my inaugural data load into GAE and really want to get my ducks in a row. The language I am using is Python. My question is about the storing of storing read only temporal data in app engine.
I have a spreadsheet with ~50k rows times 30 columns.
It is historic data and the table will be read-only.
I envisage a lot of sorting by day of week (show me weekends in July and so on) and also time (not necessarily always with date - but sometimes with) so my initial thought would be to create an extra row and fill that in a previously-computed "day of week".
eg.
date, time, dayofweek, event, geolocation, etc
27-02-2009, 08:20:00, 'Friday' ...
That date and time shown above reflects how it is coming from the spreadsheet ATM.
If I am going to go to the trouble of computing the "dayofweek", which is I presume a good idea, and given your experience of GAE's Datastore models should I:
-just have a single datetime? (and let GAE work out the day of week during future sorts/requests?) -create a dayofweek but have a single daydate? '2009-02-27 08:10:00' -store day of week as an integer instead? (0 = Monday in Python IIRC) -store date as '2009-02-27' instead?
Output will always be English, and may also be JSON. It is for a data visulatisation study and some extra temporal animation could be done on the client - but my primary concern is to keep the work done in GAE down, which I admit might just be a n00bish fetish.
Upvotes: 3
Views: 532
Reputation: 535
I agree that you definitely want to break down your date in ways that make sense for queries. Building some of that information into the key can be useful as suggested as well. It is not always possible though. For example, there may be competing concerns for the key. Yet another option is to build a query index just based on date and use ancestor/parent relationship. Then you can do a keys only query on the date oriented entities and use them to get the parents.
One last thing, I recommend that, however you break down the date, do store the datetime in full as well. You may want his later if there has been some mistake or if you change your scheme.
Upvotes: 1
Reputation: 101149
If you plan to query based on day-of-week - that is, you need to select several Fridays, but not the neighbouring days - you should absolutely break this out into a separate property, as it'll make it possible to do efficient queries for it.
To do the query 'Fridays in July', as you suggest, you've got several options:
Upvotes: 3
Reputation: 3866
If you want maximum performance you can add dayofyear and year in your key name. For example what I do is that i construct keys like that: analytic:124:2011
Then it is quite easy to get the last 30 days let's. Just create a list generator that will create a list of keys for the 30 last days
from datetime import datetime
dof = datetime.now().timetuple()
keys = [db.Key.from_path("Analytic","%i:%i"%(dof-i,2011)) for in in xrange(0,30)]
analytics = db.get(keys)
Upvotes: 1
Reputation: 2936
You should just have a single datetime. This will allow your system to be utilized by other systems as the need arises in the future. I also believe that computable items should not be stored in the DB as the parameters from some computations can evolve over time.
HTH,
Upvotes: 1