Durable architecture in Python distributed application

Question

Wondering about durable architectures for distributed Python applications. This question I asked before should provide a little guidance about the sort of application it is. We would like to have the ability to have several code servers and several database servers, and ideally some method of deployment that is manageable and not too much of a pain.

The question I mentioned provides an answer that I like, but I wonder how it could be made more durable, or if doing so requires using other technologies. In particular:

I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:

Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.

The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.

What is the best way to make durable applications? Any suggestions on how to either "code for failure" or design it differently so that we don't necessarily have to? If you think Python might not be suited for this, that is also a valid solution.

sean · Accepted Answer

Well to continue on the previous answer I gave.

In my projects I code for failure, because I use AWS for a lot of my hosting needs.

I have implemented database backends that will make sure that the database, region, is accessible and if not it will choose another region from a specified list. This happens transparently to the rest of the system on that node. So, if the east-1a region goes down I have a few other regions that I also host in that it will failover into, such as the west coast. I keep track of currently going database transactions and send them over to the west coast and dump them to a file so I can import them into the old database region once it becomes available.

My front end servers sit behind a elastic load balancer that is distributed across multiple regions and this allows for durable recovery if a region fails. But, it cannot be relied upon so I am looking into solutions such as running a HAProxy and switching my DNS in the case that my ELB goes down. This is a work in progress and I cannot give specifics on my own solutions.

To make your data processing durable look into Celery and store the data in a distributed mongo server to keep your results safe. Using a durable data store to keep your results allows you to get them back in the event of a node crash. It comes at the cost of some performance, but it shouldn't be too terrible if you only rely on soft-realtime constraints.

http://www.mnxsolutions.com/amazon/designing-for-failure-with-amazon-web-services.html

The above article talks mostly about AWS but the ideas apply to any system that you need to keep high availability in and system durability. Just remember that downtime is ok as long as you minimize it for a subset of users.

Durable architecture in Python distributed application

Answers (1)

Related Questions