Fill in missing boolean rows in Pandas

Question

I have a MySQL query that is doing a groupby and returning data in the following form:

ID | Boolean | Count

Sometimes there isn't data in the table for one of the boolean states, so data for a single ID might be returned like this:

1234 | 0 | 10

However I need it in this form for downstream analysis:

1234 | 0 | 10
1234 | 1 | 0

with an index on [ID, Boolean].

From querying Google and SO, it seems like getting MySQL to do this transform is a bit of a pain. Is there a simple way to do this in Pandas? I haven't been able to find anything useful in the docs or the Pandas cookbook.

You can assume that I've already loaded the data into a Pandas dataframe with no indexes.

Thanks.

Paul H · Accepted Answer

I would set the index of your dataframe to the ID and Boolean columns, and the construct an new index from the Cartesian product of the unique values.

That would look like this:

import pandas
indexcols = ['ID', 'Boolean']

data = pandas.read_sql_query(engine, querytext)
full_index = pandas.MultiIndex.from_product(
    [data['ID'].unique(), [0, 1]], 
    names=indexcols
)

data = (
    data.set_index(indexcols)
        .reindex(full_index)
        .fillna(0)
        .reset_index()
)

Fill in missing boolean rows in Pandas

Answers (1)

Related Questions