Reputation: 227
I have this data frame:
user day
A 1
A 4
B 2
B 4
I want to change the data frame into this:
user day_1 day_2 day_3 day_4
A 1 0 0 1
B 0 1 0 1
So it will automatically generate day_3 even there is no user on day_3
I have tried with this code but it doesn't work
for index, row in grouped_user.iterrows():
grouped_user["day_" + str(int(row.active_period))] = 1
Upvotes: 1
Views: 84
Reputation: 863751
Use get_dummies
with converting to strings and aggregate max
:
df1 = pd.get_dummies(df.astype(str), columns=['day']).groupby('user', as_index=False).max()
print (df1)
user day_1 day_2 day_4
0 A 1 0 1
1 B 0 1 1
If is necessary add missing days convert user
to index, use get_dummies
with DataFrame.reindex
for add all possible days to columns:
days = [f'day_{x}' for x in range(df['day'].min(), df['day'].max() + 1)]
df1 = (pd.get_dummies(df.set_index('user').astype(str))
.max(level=0)
.reindex(columns=days, fill_value=0)
.reset_index())
print (df1)
user day_1 day_2 day_3 day_4
0 A 1 0 0 1
1 B 0 1 0 1
Another solution with crosstab
and DataFrame.clip
:
df1 = (pd.crosstab(df['user'], df['day'])
.clip(upper=1)
.reindex(range(df['day'].min(), df['day'].max()+1), fill_value=0, axis=1)
.add_prefix('day_')
.rename_axis(None, axis=1)
.reset_index())
print (df1)
user day_1 day_2 day_3 day_4
0 A 1 0 0 1
1 B 0 1 0 1
Upvotes: 2
Reputation: 17911
You can use the function pivot_table()
:
df.assign(vals=1).\
pivot_table(index='user', columns='day', values='vals', fill_value=0).\
reindex(range(df['day'].min(), df['day'].max()+1), fill_value=0, axis=1).\
add_prefix('day_')
Result:
day day_1 day_2 day_3 day_4
user
A 1 0 0 1
B 0 1 0 1
Upvotes: 3