user1034697
user1034697

Reputation:

Data Warehouse and Django

This is more of an architectural question than a technological one per se.

I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).

I am using Django and a PostgreSQL database.

Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.

We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.

I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).

Thank you in advance!

Upvotes: 5

Views: 4197

Answers (2)

C. Ramseyer
C. Ramseyer

Reputation: 2382

Here are some cool Open Source tools I used recently:

  • Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
  • Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.

Upvotes: 6

Joseph Victor Zammit
Joseph Victor Zammit

Reputation: 15320

My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.

Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.

Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.

Upvotes: 0

Related Questions