Cups
Cups

Reputation: 6896

What is the best way of describing day of week?

I planning my inaugural data load into GAE and really want to get my ducks in a row. The language I am using is Python. My question is about the storing of storing read only temporal data in app engine.

I have a spreadsheet with ~50k rows times 30 columns.

It is historic data and the table will be read-only.

I envisage a lot of sorting by day of week (show me weekends in July and so on) and also time (not necessarily always with date - but sometimes with) so my initial thought would be to create an extra row and fill that in a previously-computed "day of week".

eg.
date, time, dayofweek, event, geolocation, etc
27-02-2009, 08:20:00, 'Friday' ...

That date and time shown above reflects how it is coming from the spreadsheet ATM.

If I am going to go to the trouble of computing the "dayofweek", which is I presume a good idea, and given your experience of GAE's Datastore models should I:

-just have a single datetime? (and let GAE work out the day of week during future sorts/requests?) -create a dayofweek but have a single daydate? '2009-02-27 08:10:00' -store day of week as an integer instead? (0 = Monday in Python IIRC) -store date as '2009-02-27' instead?

Output will always be English, and may also be JSON. It is for a data visulatisation study and some extra temporal animation could be done on the client - but my primary concern is to keep the work done in GAE down, which I admit might just be a n00bish fetish.

Upvotes: 3

Views: 532

Answers (4)

Jay
Jay

Reputation: 535

I agree that you definitely want to break down your date in ways that make sense for queries. Building some of that information into the key can be useful as suggested as well. It is not always possible though. For example, there may be competing concerns for the key. Yet another option is to build a query index just based on date and use ancestor/parent relationship. Then you can do a keys only query on the date oriented entities and use them to get the parents.

One last thing, I recommend that, however you break down the date, do store the datetime in full as well. You may want his later if there has been some mistake or if you change your scheme.

Upvotes: 1

Nick Johnson
Nick Johnson

Reputation: 101149

If you plan to query based on day-of-week - that is, you need to select several Fridays, but not the neighbouring days - you should absolutely break this out into a separate property, as it'll make it possible to do efficient queries for it.

To do the query 'Fridays in July', as you suggest, you've got several options:

  • Without a separate 'day of week' property, you could do four or five queries for each of the valid days
  • With a separate 'day of week' property, you can do a single query on day-of-week and date range. This will use your inequality filter for the date range, so you couldn't also do an inequality filter on any other property in that query.
  • With separate 'day of week' and 'month' properties, you could do the above query without using any inequality filters.

Upvotes: 3

PanosJee
PanosJee

Reputation: 3866

If you want maximum performance you can add dayofyear and year in your key name. For example what I do is that i construct keys like that: analytic:124:2011

Then it is quite easy to get the last 30 days let's. Just create a list generator that will create a list of keys for the 30 last days

from datetime import datetime
dof = datetime.now().timetuple()
keys = [db.Key.from_path("Analytic","%i:%i"%(dof-i,2011)) for in in xrange(0,30)]
analytics = db.get(keys)

Upvotes: 1

Joe Johnston
Joe Johnston

Reputation: 2936

You should just have a single datetime. This will allow your system to be utilized by other systems as the need arises in the future. I also believe that computable items should not be stored in the DB as the parameters from some computations can evolve over time.

HTH,

Upvotes: 1

Related Questions