Muthu
Muthu

Reputation: 1

Manipulating series in a dataframe

My dataframe has a list of comma separated values in one column. I want to find the list of distinct entries, create a new column for each distinct entry in the dataframe, then fill the new columns with 1 or 0 depending on whether the row has the city name. The idea is to use the new columns in building a logistic regression model.
As an example

Before

Name    City 
Jack    NewYork,Chicago,Seattle
Jill    Seattle, SanFrancisco
Ted     Chicago,SanFrancisco
Bill    NewYork,Seattle

After

Name    NewYork     Chicago     Seattle     SanFrancisco
Jack    1           1           1           0
Jill    0           0           1           1
Ted     0           1           0           1
Bill    1           0           1           0

Upvotes: 0

Views: 27

Answers (1)

foglerit
foglerit

Reputation: 8269

You can do this with the get_dummies str method:

import pandas as pd

df = pd.DataFrame(
    {"Name": ["Jack", "Jill", "Ted", "Bill"],
     "City": ["NewYork,Chicago,Seattle", "Seattle,SanFrancisco", "Chicago,SanFrancisco", "NewYork,Seattle"]}
)

print(pd.concat((df, df.City.str.get_dummies(",")), axis=1))

Result:

   Name                     City  Chicago  NewYork  SanFrancisco  Seattle
0  Jack  NewYork,Chicago,Seattle        1        1             0        1
1  Jill     Seattle,SanFrancisco        0        0             1        1
2   Ted     Chicago,SanFrancisco        1        0             1        0
3  Bill          NewYork,Seattle        0        1             0        1

Upvotes: 1

Related Questions