Reputation:

app engine (gae), python, ndb, deferred library or task chain, cron job or all?

I'm not very familiar with task chaining, pull queues, or push queues, cron jobs, deferred library, etc. But I know I need to use one of these for the task at hand. I'm not sure what the best approach would be.

I have an ndb.Model with a certain property that needs to be updated.

I will have thousands and thousands of these ndb.Model instances / entities that will need to be updated, the same property in fact, on each instance.

This property will not need to be updated frequently, as it will not be accessed by the end user.

Solutions I've looked into:

Map reduce seems like overkill, and I thought I read something about it needing a cvs file, which scares me away from that.
It seems like task chaining combined with a cron job could work for this, but.. I don't know if that's possible. I'm new to both and would like some confirmation.

I've read about the deferred library, would that be the best bet?

Upvotes: 1

Answers (2)

Y2H

Reputation: 2537

Depending on the exact operations you’re doing and the amount of data you’re doing them on, both could work. MapReduce is more built for this sort of tasks however.

If you’re worried about the csv part, let me give you a little idea on how you can do that. You can upload your csv file with the code and then use a code like this:

fo = open("file_path/file_name.csv", "r")
fr= fo.read()
csv_io = StringIO.StringIO(fr)
reader = csv.reader(csv_io)

Now this reader is a list containing all the lines in the file, and each line is a list of different values that were separated by a comma (,) in the original csv file. You could use a for loop/iterator to go over the lines and inside each line you could use another for loop/iterator to read the different values.

Upvotes: 0

Dan Cornilescu

Reputation: 39824

A cron job itself is practically a scheduled task, so yes, cron + task queues are possible :)

Well, it's always good to go through the docs to get a better idea:

In this article there is a guiding note which may also help:

When to use ext.deferred

You may be wondering when to use ext.deferred, and when to stick with the built-in task queue API. Here are our suggestions.

You may want to use the deferred library if:

You only use the task queue lightly.

You want to refactor existing code to run on the Task Queue with a minimum of changes.

You're writing a one off maintenance task, such as schema migration.

Your app has many different types of background tasks, and writing a separate handler for each would be burdensome.

Your task requires complex arguments that aren't easily serialized without using Pickle.

You are writing a library for other apps that needs to do background work.

You may want to use the Task Queue API if:

You need complete control over how tasks are queued and executed.

You need better queue management or monitoring than deferred provides.

You have high throughput, and overhead is important.

You are building larger abstractions and need direct control over tasks.

You like the webhook model better than the RPC model.

Naturally, you can use both the Task Queue API and the deferred library side-by-side, if your app has requirements that fit into both groups.

The Google App Engine Pipeline API (actually mapreduce + task queues) is nicely described in this article:

The Google App Engine Pipeline API connects together complex, workflows (including human tasks). The goals are flexibility, workflow reuse, and testability.

Upvotes: 2

app engine (gae), python, ndb, deferred library or task chain, cron job or all?

Answers (2)

Related Questions