Miguel Barrientos
Miguel Barrientos

Reputation: 49

Finding a string within a Pandas Column that Contains a List

I'm fairly new to Pandas and am really stumped on a project.

I have a Pandas dataframe that looks like this:

ID Retailers
ID_1 ["retailer 1", "retailer 2", "retailer 3"]
ID_2 ["retailer 1", "retailer 2"]

My goal is:

Using retailer 3 as an example:

ID Retailers retailer 3
ID_1 ["retailer 1", "retailer 2", "retailer 3"] 1
ID_2 ["retailer 1", "retailer 2"] 0

Hope that makes sense.

Upvotes: 3

Views: 88

Answers (3)

rhug123
rhug123

Reputation: 8768

Try this, and see if it gives you what you were expecting.

df.join(df['Retailers'].explode().str.get_dummies().groupby(level=0).sum())

Upvotes: 0

ggaurav
ggaurav

Reputation: 1804

df = df.set_index('ID')
df
        Retailers
ID  
ID_1    [retailer 1, retailer 2, retailer 3]
ID_2    [retailer 1, retailer 2]

Firstly, you can explode the Retailers columns. Then you can add a dummy column of 1 to populate 1 wherever required in the final table

temp = df['Retailers'].explode().to_frame()
temp['vals'] = 1
temp
        Retailers   vals
ID      
ID_1    retailer 1  1
ID_1    retailer 2  1
ID_1    retailer 3  1
ID_2    retailer 1  1
ID_2    retailer 2  1

Then you can create pivot table using pivot to get the table. The NaNs can be populated with 0.

out_df = (pd.pivot(temp, columns='Retailers')
          .fillna(0)
          .astype(int)
          .droplevel(level=0, axis=1) # formating, remove extra level of columns
          .rename_axis(None, axis=1)) # formating, remove the axis name
out_df

enter image description here

As the Retailers column has to be retained,

out_df.insert(0, 'Retailers', df['Retailers'])
out_df

enter image description here

Upvotes: 1

jjsantoso
jjsantoso

Reputation: 1758

This is a way to do it:

import pandas as pd

df = pd.DataFrame({'ID': {0: 'ID_1', 1: 'ID_2'},
 'Retailers': {0: ["retailer 1", "retailer 2", "retailer 3"],
  1: ["retailer 1", "retailer 2"]},
 'retailer 3': {0: 1, 1: 0}})

# get unique retailers
retailers = set(df['Retailers'].sum())
# create variables
for r in retailers:
    df[r] = df['Retailers'].apply(lambda x: r in x).astype(int)

enter image description here

Upvotes: 1

Related Questions