Determine counts for each column by label

Question

I have a very large (308801, 256) dataframe I am working with. In the dataframe, there is a column, ON_TIME, which holds the values 1 or 0 (yes, we made the delivery on time, or no, we did not).

I would like to, for each column, count the amount of times each value was a 1, or a 0. An example dataset looks like this:

| Delivery_Type | Delivery_Driver | ON_TIME |
|:-------------:|:---------------:|:-------:|
|       A       |    Wundermahn   |    1    |
|       B       |    Wundermahn   |    0    |
|       B       |    Wundermahn   |    0    |
|       A       |    Jon Skeet    |    1    |
|       C       |    Jon Skeet    |    1    |
|       A       |    Wundermahn   |    0    |

I want a dataset, for each column, that looks like this:

| Delivery_Type | ON_TIME_1 | ON_TIME_0 |
|:-------------:|:---------:|:---------:|
|       A       |     2     |     1     |
|       B       |     0     |     2     |
|       C       |     1     |     0     |

I know in SQL, I could do something like:

SELECT
    DELIVERY_TYPE,
    SUM(CASE WHEN ON_TIME = 1 THEN 1 ELSE 0 END AS ON_TIME_1) AS ON_TIME_1,
    SUM(CASE WHEN ON_TIME = 0 THEN 1 ELSE 0 END AS ON_TIME_0) AS ON_TIME_0
FROM
    dataframe
GROUP BY
    DELIVERY_TYPE

But how can I do this in Python? I have tried:

for col in df:
    temp = df[col].groupby('ON_TIME')
    print(temp)

But that is not working. How can I achieve the desired result in pandas?

ansev · Accepted Answer

Use pd.crosstab with DataFrame.add_prefix:

df_count = (pd.crosstab(df['Delivery_Type'], df['ON_TIME'], colnames=[None])
              .add_prefix('ON_TIME_')
              .reset_index())
print(df_count)

  Delivery_Type  ON_TIME_0  ON_TIME_1
0             A          1          2
1             B          2          0
2             C          0          1

Determine counts for each column by label

Answers (1)

Related Questions