Reputation: 497
While executing the code on the below data i am getting Error : SyntaxError: unexpected EOF while parsing
I have a folder where multiple csv files are been placed, I need to process each file and split the column (Column2) value using the split function ";". Once The values are split we have to project key as column Name and Key values as column value.
Column1 Column2
Item1 Material, Teflon ; MODEL: 28' Inches ; MAKE : SAMSUNG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
Item1 Material, PLASTIC ; MODEL: 55' Inches ; MAKE : SONY ; SUPPLIER/PO DETAILS: DK MART ; POWER INPUT :55W @240 VOLTS ; NO OF INPUTS : 5 ; METHOD : NEO AIR COOLED ; TYPE : SMART LED
Item1 Material, Teflon ; MODEL: 42' Inches ; MAKE : LG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
NaN
NaN
Item1 MATERIAL, PLASTIC ; MAKE : VIDEOCON ; POWER INPUT : 22V /240 VOLT ; COMPLETED UNIT : SPARES
Item1 MATERIAL ; MAKE : SONY ; SUPPLIER/PO DETAILS: AW Tech; ; COMPLETED UNIT : UNIT PARTS
Expected Output
Item MODEL Make Supplier/PO Details Power Input No Of Inputs Method Type Completed Units
Material, Teflon 28' Inches SAMSUNG AW Tech 65W @240 VOLTS 4 Air Cooled LED
Material, PLASTIC 55' Inches SONY DK Material 55W @240 VOLTS 5 NEO AIR COOLED Smart LED
Material, Teflon 42' Inches LG AW Tech 65W @240 VOLTS 4 Air Cooled LED
MATERIAL, PLASTIC VIDEOCON 22V /240 VOLTS SPARES
Material SONY AW Tech UNIT PARTS
Code i have been Trying :
from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
df=pd.read_csv(fname)
my_list=list(df.columns)
print(len(my_list),my_list)
out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
newout=newout.str.rsplit(',',1,expand=True)
m=~(newout[1].str.contains(':').fillna(True))
newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
newout.insert(0,'Item',out[0])
newout.columns=newout.columns.str.strip()
Upvotes: 1
Views: 129
Reputation: 1005
Just in case, If your error is not fixed you can try this code:
I tried it like this
solution:
import pandas as pd
# Assuming you can Loop on csv folder, then:
df = pd.read_csv('data_.csv')
df.dropna(subset = ["Column2"], inplace=True)
new_data = {'Item' : {}}
for index, row in enumerate(df['Column2'].to_list()):
row_values = row.split(';')
new_data["Item"][index] = (row_values[0].strip())
for kv in row_values[1:]:
key_value = kv.split(':')
if len(key_value) != 2:
continue
key = key_value[0].strip()
value = key_value[1].strip()
if key in new_data:
new_data[key][index] = value
else:
new_data[key] = {index : value}
new_df = pd.DataFrame(new_data)
print(new_df)
Output:
Note: Assuming you can loop on directory with csv
Upvotes: 1
Reputation: 3
EndOfFile Error is actually simple to solve would use print statements inside your for loop to see were the code is exploding. Take a deep breath and follow this site https://careerkarma.com/blog/python-syntaxerror-unexpected-eof-while-parsing/
step by step.
from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
df=pd.read_csv(fname)
//print the df you are looking at so you can see what data is not
being processed in your for loop
print(df)
I would comment these out to make sure your df is correct Then add the next line of code
my_list=list(df.columns)
print(len(my_list),my_list)
out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
//So you parce here..would place a print statement
print(out)
This would be the next line to add once you go through those Remember to comment out the print statements as you add more code to ensure the data is being handled correctly while you debug this. Learning to debug this is more important than getting the answer.
newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
newout=newout.str.rsplit(',',1,expand=True)
//you parce on this line
print(newout)
m=~(newout[1].str.contains(':').fillna(True))
newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
//this would need a print statement
print(newout)
newout.insert(0,'Item',out[0])
newout.columns=newout.columns.str.strip()
This would be my first suggestion to see were your end of file is happening. Would also break this up in google colabs so you can follow the logic behind what your coding...would add an if(end of file)...break-> to stop the for loop when debugging on your splits.
Upvotes: 0