Reputation: 1583
So it's supposed that when passing the argument observed=False to pandas.pivot_table should create missing columns but it doesn't.
In my example, I have a categorical column "hours", and I want to get the whole 24 columns even if input data has not all the hours.
import pandas as pd
df = pd.DataFrame([["user_1", 23],
["user_1", 22],
["user_2", 21]],
columns=["user", "hour"]).set_index("user")
df["hour"] = pd.Categorical.from_codes(df["hour"], categories=range(24))
df = df.assign(count=True)
df.pivot_table(columns="time",
index="user",
values="count",
fill_value=False,
observed=False,
)
Output I get:
22 23
user
user_1 True True
user_2 True False
Expected output:
0 ... 22 23
user
user_1 False ... True True
user_2 False ... True False
Upvotes: 2
Views: 990
Reputation: 22503
You can skip the Categorical
and do a reindex
on axis=1
after pivot:
df.pivot_table(columns="hour",
index="user",
values="count",
fill_value=False,
observed=False,
).reindex(range(24), axis=1, fill_value=False)
hour 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
user
user_1 False False False False False False False False False False False False False False False False False False False False False False True True
user_2 False False False False False False False False False False False False False False False False False False False False False True False False
Upvotes: 2