Reputation: 7281
I have a very large (308801, 256) dataframe
I am working with. In the dataframe
, there is a column, ON_TIME
, which holds the values 1 or 0 (yes, we made the delivery on time, or no, we did not).
I would like to, for each column, count the amount of times each value was a 1, or a 0. An example dataset looks like this:
| Delivery_Type | Delivery_Driver | ON_TIME |
|:-------------:|:---------------:|:-------:|
| A | Wundermahn | 1 |
| B | Wundermahn | 0 |
| B | Wundermahn | 0 |
| A | Jon Skeet | 1 |
| C | Jon Skeet | 1 |
| A | Wundermahn | 0 |
I want a dataset, for each column, that looks like this:
| Delivery_Type | ON_TIME_1 | ON_TIME_0 |
|:-------------:|:---------:|:---------:|
| A | 2 | 1 |
| B | 0 | 2 |
| C | 1 | 0 |
I know in SQL, I could do something like:
SELECT
DELIVERY_TYPE,
SUM(CASE WHEN ON_TIME = 1 THEN 1 ELSE 0 END AS ON_TIME_1) AS ON_TIME_1,
SUM(CASE WHEN ON_TIME = 0 THEN 1 ELSE 0 END AS ON_TIME_0) AS ON_TIME_0
FROM
dataframe
GROUP BY
DELIVERY_TYPE
But how can I do this in Python? I have tried:
for col in df:
temp = df[col].groupby('ON_TIME')
print(temp)
But that is not working. How can I achieve the desired result in pandas?
Upvotes: 1
Views: 97
Reputation: 30930
Use pd.crosstab
with DataFrame.add_prefix
:
df_count = (pd.crosstab(df['Delivery_Type'], df['ON_TIME'], colnames=[None])
.add_prefix('ON_TIME_')
.reset_index())
print(df_count)
Delivery_Type ON_TIME_0 ON_TIME_1
0 A 1 2
1 B 2 0
2 C 0 1
Upvotes: 2