Cross-tabulation (contingency table) using Python ORM?

Question

Anyone doing very basic statistical explorations on data living in a relational database has had to calculate cross tabulations,also known as contingency tables(wikipedia page). these are indispensable when you need to count how many items fall more than one category at the same time. For example: How many customers are female and like chocolate?

Scipy has ways to do this for matrices, using a variation of histogram2d But for meaningfull statistical analysis you need to be able to have a table (with variable names) from which you can specify which variables you would like to tabulate. Moreover, it needs to work for other types of variables, not only numeric. Actually, numeric tabulation is the more complicated one since it requires binning. R naturally has such a function which is called table, which could easily be ported to Python. However, remember I mention at the title that I'd like to use an ORM, why? Because cross-tabulations are much smaller that the data used to generate it, and you could have a 2x2 table calculated from billions of records on a database. My point is: In serious applications, you can't afford to bring all your data to memory and loop through it. So you'd have to have table design converted into a SQL query so that all the counting would be done by the database engine. And the ORM would take care of the necessary SQL dialect adjustments necessary so that you could run your code with any database backend. A example of the SQL (in MySQL dialect) for a simple cross tabulation can be found here.

So now that I think I have motivated you about the problem, here are the questions: Is this functionality implemented in any Python ORM? How would you implement this using, say, SQLAlchemy or Django ORM?

Cross-tabulation (contingency table) using Python ORM?

Answers (1)

Related Questions