Add lists to DF / Expand string to new column and data

Question

I have a CSV file with a column that contains a string with vehicle options.

Brand     Options
Toyota    Color:Black,Wheels:18
Toyota    Color:Black, 
Chevy     Color:Red,Wheels:16,Style:18
Honda     Color:Green,Personalization:"Customer requested detailing"
Chevy     Color:Black,Wheels:16

I want to expand the "Options" string to new columns with the appropriate name. The dataset is considerably large so I am trying to name the columns programmatically (ie: Color, Wheels, Personalization) then apply the respective value to the row or a null value.

Adding new data

import pandas as pd
Cars = pd.read_csv("Cars.CSV")                          # Loads cars into df 
split = Cars["Options"].str.split(",", expand = True)   # Data in form of {"Color:Black", "Wheels:16"}

split[0][0].split(":") # returns ['Color', 'Black']

What is an elegant way to concat these lists to the original dataframe Cars without specified columns?

David Erickson · Accepted Answer

You can prepare for a clean split by first using rstrip to avoid a null column, since you have one row with a comma at the end. Then, after splitting, explode to multiple rows and split again by :, this time using expand=True. Then, pivot the dataset into the desired format and concat back to the original dataframe:

pd.concat([df,
           df['Options'].str.rstrip(',')
                        .str.split(',')
                        .explode()
                        .str.split(':', expand=True)
                        .pivot(values=1, columns=0)],
          axis=1).drop('Options', axis=1)
Out[1]: 
    Brand  Color                 Personalization Style Wheels
0  Toyota  Black                             NaN   NaN     18
1  Toyota  Black                             NaN   NaN    NaN
2   Chevy    Red                             NaN    18     16
3   Honda  Green  "Customer requested detailing"   NaN    NaN
4   Chevy  Black                             NaN   NaN     16

Add lists to DF / Expand string to new column and data

Answers (1)

Related Questions