I'm fairly new to Pandas and am really stumped on a project. I have a Pandas dataframe that looks like this: ID Retailers ID_1 ["retailer 1", "retailer 2", "retailer 3"] ID_2 ["retailer 1", "retailer 2"] My goal is: Loop through a list of all possible retailers. Create a column for each possible retailer If the current retailer is in the row's list of retailers, then they should be coded as a 1 If that retailer is not in the list, then they should be coded as a 0 Using retailer 3 as an example: ID Retailers retailer 3 ID_1 ["retailer 1", "retailer 2", "retailer 3"] 1 ID_2 ["retailer 1", "retailer 2"] 0 Hope that makes sense.

Reputation: 49

Finding a string within a Pandas Column that Contains a List

I'm fairly new to Pandas and am really stumped on a project.

I have a Pandas dataframe that looks like this:

ID	Retailers
ID_1	["retailer 1", "retailer 2", "retailer 3"]
ID_2	["retailer 1", "retailer 2"]

My goal is:

Loop through a list of all possible retailers.
Create a column for each possible retailer
If the current retailer is in the row's list of retailers, then they should be coded as a 1
If that retailer is not in the list, then they should be coded as a 0

Using retailer 3 as an example:

ID	Retailers	retailer 3
ID_1	["retailer 1", "retailer 2", "retailer 3"]	1
ID_2	["retailer 1", "retailer 2"]	0

Hope that makes sense.

Upvotes: 3

Answers (3)

rhug123

Reputation: 8768

Try this, and see if it gives you what you were expecting.

df.join(df['Retailers'].explode().str.get_dummies().groupby(level=0).sum())

Upvotes: 0

ggaurav

Reputation: 1804

df = df.set_index('ID')
df
        Retailers
ID  
ID_1    [retailer 1, retailer 2, retailer 3]
ID_2    [retailer 1, retailer 2]

Firstly, you can explode the Retailers columns. Then you can add a dummy column of 1 to populate 1 wherever required in the final table

temp = df['Retailers'].explode().to_frame()
temp['vals'] = 1
temp
        Retailers   vals
ID      
ID_1    retailer 1  1
ID_1    retailer 2  1
ID_1    retailer 3  1
ID_2    retailer 1  1
ID_2    retailer 2  1

Then you can create pivot table using pivot to get the table. The NaNs can be populated with 0.

out_df = (pd.pivot(temp, columns='Retailers')
          .fillna(0)
          .astype(int)
          .droplevel(level=0, axis=1) # formating, remove extra level of columns
          .rename_axis(None, axis=1)) # formating, remove the axis name
out_df

As the Retailers column has to be retained,

out_df.insert(0, 'Retailers', df['Retailers'])
out_df

Upvotes: 1

jjsantoso

Reputation: 1758

This is a way to do it:

import pandas as pd

df = pd.DataFrame({'ID': {0: 'ID_1', 1: 'ID_2'},
 'Retailers': {0: ["retailer 1", "retailer 2", "retailer 3"],
  1: ["retailer 1", "retailer 2"]},
 'retailer 3': {0: 1, 1: 0}})

# get unique retailers
retailers = set(df['Retailers'].sum())
# create variables
for r in retailers:
    df[r] = df['Retailers'].apply(lambda x: r in x).astype(int)

Upvotes: 1

Finding a string within a Pandas Column that Contains a List

Answers (3)

Related Questions