Reputation: 1729
I have a MySQL query that is doing a groupby and returning data in the following form:
ID | Boolean | Count
Sometimes there isn't data in the table for one of the boolean states, so data for a single ID might be returned like this:
1234 | 0 | 10
However I need it in this form for downstream analysis:
1234 | 0 | 10
1234 | 1 | 0
with an index on [ID, Boolean].
From querying Google and SO, it seems like getting MySQL to do this transform is a bit of a pain. Is there a simple way to do this in Pandas? I haven't been able to find anything useful in the docs or the Pandas cookbook.
You can assume that I've already loaded the data into a Pandas dataframe with no indexes.
Thanks.
Upvotes: 0
Views: 88
Reputation: 68146
I would set the index of your dataframe to the ID
and Boolean
columns, and the construct an new index from the Cartesian product of the unique values.
That would look like this:
import pandas
indexcols = ['ID', 'Boolean']
data = pandas.read_sql_query(engine, querytext)
full_index = pandas.MultiIndex.from_product(
[data['ID'].unique(), [0, 1]],
names=indexcols
)
data = (
data.set_index(indexcols)
.reindex(full_index)
.fillna(0)
.reset_index()
)
Upvotes: 1