Create new columns from existing column values using Split function in Python

Question

While executing the code on the below data i am getting Error : SyntaxError: unexpected EOF while parsing

I have a folder where multiple csv files are been placed, I need to process each file and split the column (Column2) value using the split function ";". Once The values are split we have to project key as column Name and Key values as column value.

Column1   Column2 

Item1    Material, Teflon ; MODEL: 28' Inches ; MAKE : SAMSUNG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
Item1    Material, PLASTIC ; MODEL: 55' Inches ; MAKE : SONY ; SUPPLIER/PO DETAILS: DK MART ; POWER INPUT :55W @240 VOLTS ; NO OF INPUTS : 5 ; METHOD : NEO AIR COOLED ; TYPE : SMART LED
Item1    Material, Teflon ; MODEL: 42' Inches ; MAKE : LG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
NaN
NaN
Item1     MATERIAL, PLASTIC ; MAKE        : VIDEOCON ; POWER INPUT        : 22V /240 VOLT ; COMPLETED UNIT : SPARES
Item1    MATERIAL ; MAKE : SONY ; SUPPLIER/PO DETAILS: AW Tech; ; COMPLETED UNIT : UNIT PARTS

Expected Output

Item                MODEL       Make    Supplier/PO Details  Power Input    No Of Inputs  Method      Type      Completed Units

Material, Teflon    28' Inches  SAMSUNG       AW Tech        65W @240 VOLTS     4        Air Cooled    LED        
Material, PLASTIC   55' Inches  SONY          DK Material    55W @240 VOLTS     5      NEO AIR COOLED Smart LED    
Material, Teflon    42' Inches  LG            AW Tech        65W @240 VOLTS     4        Air Cooled    LED  
MATERIAL, PLASTIC               VIDEOCON                     22V /240 VOLTS                                         SPARES
Material                        SONY          AW Tech                                                               UNIT PARTS

Code i have been Trying :

from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
  df=pd.read_csv(fname)
  my_list=list(df.columns)
  print(len(my_list),my_list)           
  out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
  newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
  newout=newout.str.rsplit(',',1,expand=True)
  m=~(newout[1].str.contains(':').fillna(True))
  newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
  newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
  newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
  newout.insert(0,'Item',out[0])
  newout.columns=newout.columns.str.strip()

Abhi · Accepted Answer

Just in case, If your error is not fixed you can try this code:

I tried it like this

solution:

import pandas as pd

# Assuming you can Loop on csv folder, then:
df = pd.read_csv('data_.csv')
df.dropna(subset = ["Column2"], inplace=True)

new_data = {'Item' : {}}
for index, row in enumerate(df['Column2'].to_list()):
    row_values = row.split(';') 
    new_data["Item"][index] = (row_values[0].strip())
    for kv in row_values[1:]:
        key_value = kv.split(':')
        if len(key_value) != 2:
            continue

        key = key_value[0].strip()
        value = key_value[1].strip()

        if key in new_data:
            new_data[key][index] = value
        else:
            new_data[key] = {index : value}
            
new_df  = pd.DataFrame(new_data)
print(new_df)

Output:

Note: Assuming you can loop on directory with csv

Create new columns from existing column values using Split function in Python

Answers (2)

Related Questions