Reputation: 55
let say i have a dataframe that consist of these two columns.
User_id hotel_cluster
1 0
2 2
3 2
3 3
3 0
4 2
i want to change it into something like this. Do i need to write a function or is there a pandas way to do it?
User_id hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
1 1 0 0 0
2 0 0 1 0
3 1 0 1 1
4 0 0 1 0
Please help! Sorry if i am not posting the question in the right format Thank you!
Upvotes: 2
Views: 46
Reputation: 88276
A simple way if you do not need the non-appearing values is to use pd.get_dummies
:
pd.get_dummies(df.hotel_cluster, prefix = 'hotel_cluster').set_index(df.User_id)
Otherwise you want something like @piRSquared's solution.
Upvotes: 1
Reputation: 294488
IIUC:
First change 'hotel_cluster'
to a categorical that includes categories that don't exist
col = 'hotel_cluster'
df[col] = pd.Categorical(df[col], categories=[0, 1, 2, 3])
pd.crosstab(*map(df.get, df)).add_prefix(f"{col}_")
hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
User_id
1 1 0 0 0
2 0 0 1 0
3 1 0 1 1
4 0 0 1 0
Reindex after crosstab
pd.crosstab(*map(df.get, df)).reindex(
columns=range(4), fill_value=0
).add_prefix('hotel_cluster_')
hotel_cluster hotel_cluster_0 hotel_cluster_1 hotel_cluster_2 hotel_cluster_3
User_id
1 1 0 0 0
2 0 0 1 0
3 1 0 1 1
4 0 0 1 0
Upvotes: 2