Reputation:
I'm using pandas
package to perform some data aggregation and manipulation. I have a code that looks like this:
df = df[df['OT'] == 'OT']
pd.crosstab(df['Delivery'], [df['Week'], df['Description']]).stack().fillna(0)
It generates the following output, which is correct:
Week 201909 201910 201911 201912 201913 201915 201917
Delivery Description
Normal CD 23 0.0 0.0 0.0 23 0.0 2
CDG 26 6.0 0.0 0.0 33 0.0 6
CDH 40 4.0 3.0 5.0 78 0.0 4
Fast CD 4 0.0 0.0 0.0 6 0.0 0
CDG 6 2.0 2.0 1.0 6 1.0 0
CDH 5 0.0 0.0 4.0 5 0.0 0
The problem is that there are also rows that have Week
column with values 201914
or 201916
, though none of these rows that an OT
column value that satisfy the condition I have for filtering (i.e. df['OT'] == 'OT'
). Hence, initially they all get wiped out, and later there doesn't exist any row with column values for Week
that equal 201914
or 201916
. But, what I want to do in my crosstab is that in such a case, I want to include all 0 values for that column. So, in my above example, the end result should also include a column for 201914
and 201916
, and all the values should be 0.0
. How can I achieve this with pandas
?
Upvotes: 0
Views: 661
Reputation: 323276
First you can check your column type to category
, then when you done with manipulation of your data , just change the columns type back to str
df['Week']=df['Week'].astype('category')
s=pd.crosstab(df['Delivery'], [df['Week'], df['Description']])
s.columns=s.columns.astype(str)
s.stack().fillna(0)
Upvotes: 1