Reputation: 49
I'm fairly new to Pandas and am really stumped on a project.
I have a Pandas dataframe that looks like this:
ID | Retailers |
---|---|
ID_1 | ["retailer 1", "retailer 2", "retailer 3"] |
ID_2 | ["retailer 1", "retailer 2"] |
My goal is:
Using retailer 3 as an example:
ID | Retailers | retailer 3 |
---|---|---|
ID_1 | ["retailer 1", "retailer 2", "retailer 3"] | 1 |
ID_2 | ["retailer 1", "retailer 2"] | 0 |
Hope that makes sense.
Upvotes: 3
Views: 88
Reputation: 8768
Try this, and see if it gives you what you were expecting.
df.join(df['Retailers'].explode().str.get_dummies().groupby(level=0).sum())
Upvotes: 0
Reputation: 1804
df = df.set_index('ID')
df
Retailers
ID
ID_1 [retailer 1, retailer 2, retailer 3]
ID_2 [retailer 1, retailer 2]
Firstly, you can explode the Retailers
columns. Then you can add a dummy column of 1 to populate 1 wherever required in the final table
temp = df['Retailers'].explode().to_frame()
temp['vals'] = 1
temp
Retailers vals
ID
ID_1 retailer 1 1
ID_1 retailer 2 1
ID_1 retailer 3 1
ID_2 retailer 1 1
ID_2 retailer 2 1
Then you can create pivot table using pivot to get the table. The NaN
s can be populated with 0
.
out_df = (pd.pivot(temp, columns='Retailers')
.fillna(0)
.astype(int)
.droplevel(level=0, axis=1) # formating, remove extra level of columns
.rename_axis(None, axis=1)) # formating, remove the axis name
out_df
As the Retailers
column has to be retained,
out_df.insert(0, 'Retailers', df['Retailers'])
out_df
Upvotes: 1
Reputation: 1758
This is a way to do it:
import pandas as pd
df = pd.DataFrame({'ID': {0: 'ID_1', 1: 'ID_2'},
'Retailers': {0: ["retailer 1", "retailer 2", "retailer 3"],
1: ["retailer 1", "retailer 2"]},
'retailer 3': {0: 1, 1: 0}})
# get unique retailers
retailers = set(df['Retailers'].sum())
# create variables
for r in retailers:
df[r] = df['Retailers'].apply(lambda x: r in x).astype(int)
Upvotes: 1