NKJ
NKJ

Reputation: 497

Create new columns from existing column values using Split function in Python

While executing the code on the below data i am getting Error : SyntaxError: unexpected EOF while parsing

I have a folder where multiple csv files are been placed, I need to process each file and split the column (Column2) value using the split function ";". Once The values are split we have to project key as column Name and Key values as column value.

Column1   Column2 

Item1    Material, Teflon ; MODEL: 28' Inches ; MAKE : SAMSUNG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
Item1    Material, PLASTIC ; MODEL: 55' Inches ; MAKE : SONY ; SUPPLIER/PO DETAILS: DK MART ; POWER INPUT :55W @240 VOLTS ; NO OF INPUTS : 5 ; METHOD : NEO AIR COOLED ; TYPE : SMART LED
Item1    Material, Teflon ; MODEL: 42' Inches ; MAKE : LG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
NaN
NaN
Item1     MATERIAL, PLASTIC ; MAKE        : VIDEOCON ; POWER INPUT        : 22V /240 VOLT ; COMPLETED UNIT : SPARES
Item1    MATERIAL ; MAKE : SONY ; SUPPLIER/PO DETAILS: AW Tech; ; COMPLETED UNIT : UNIT PARTS

Expected Output

Item                MODEL       Make    Supplier/PO Details  Power Input    No Of Inputs  Method      Type      Completed Units

Material, Teflon    28' Inches  SAMSUNG       AW Tech        65W @240 VOLTS     4        Air Cooled    LED        
Material, PLASTIC   55' Inches  SONY          DK Material    55W @240 VOLTS     5      NEO AIR COOLED Smart LED    
Material, Teflon    42' Inches  LG            AW Tech        65W @240 VOLTS     4        Air Cooled    LED  
MATERIAL, PLASTIC               VIDEOCON                     22V /240 VOLTS                                         SPARES
Material                        SONY          AW Tech                                                               UNIT PARTS

Code i have been Trying :

from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
  df=pd.read_csv(fname)
  my_list=list(df.columns)
  print(len(my_list),my_list)           
  out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
  newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
  newout=newout.str.rsplit(',',1,expand=True)
  m=~(newout[1].str.contains(':').fillna(True))
  newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
  newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
  newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
  newout.insert(0,'Item',out[0])
  newout.columns=newout.columns.str.strip()

Upvotes: 1

Views: 129

Answers (2)

Abhi
Abhi

Reputation: 1005

Just in case, If your error is not fixed you can try this code:

I tried it like this

solution:

import pandas as pd

# Assuming you can Loop on csv folder, then:
df = pd.read_csv('data_.csv')
df.dropna(subset = ["Column2"], inplace=True)

new_data = {'Item' : {}}
for index, row in enumerate(df['Column2'].to_list()):
    row_values = row.split(';') 
    new_data["Item"][index] = (row_values[0].strip())
    for kv in row_values[1:]:
        key_value = kv.split(':')
        if len(key_value) != 2:
            continue

        key = key_value[0].strip()
        value = key_value[1].strip()

        if key in new_data:
            new_data[key][index] = value
        else:
            new_data[key] = {index : value}
            
new_df  = pd.DataFrame(new_data)
print(new_df)

Output:

enter image description here

Note: Assuming you can loop on directory with csv

Upvotes: 1

Jessica Woods
Jessica Woods

Reputation: 3

EndOfFile Error is actually simple to solve would use print statements inside your for loop to see were the code is exploding. Take a deep breath and follow this site https://careerkarma.com/blog/python-syntaxerror-unexpected-eof-while-parsing/ step by step.

from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
  df=pd.read_csv(fname)
  //print the df you are looking at so you can see what data is not 
  being processed in your for loop
  print(df)

I would comment these out to make sure your df is correct Then add the next line of code

  my_list=list(df.columns)
  print(len(my_list),my_list)           
  out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
  //So you parce here..would place a print statement
  print(out)

This would be the next line to add once you go through those Remember to comment out the print statements as you add more code to ensure the data is being handled correctly while you debug this. Learning to debug this is more important than getting the answer.

  newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
  newout=newout.str.rsplit(',',1,expand=True)
  //you parce on this line
  print(newout)
  m=~(newout[1].str.contains(':').fillna(True))
  newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
  newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
  newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
  //this would need a print statement
  print(newout)
  newout.insert(0,'Item',out[0])
  newout.columns=newout.columns.str.strip()

This would be my first suggestion to see were your end of file is happening. Would also break this up in google colabs so you can follow the logic behind what your coding...would add an if(end of file)...break-> to stop the for loop when debugging on your splits.

Upvotes: 0

Related Questions