makman
makman

Reputation: 139

ParseError: Error tokenizing data. C error: Expected 50 fields in line 224599, saw 51

I'm trying to pd.concat multiple .xlsx files in a master CSV and then combine this CSV with past CPU data which is also in CSV format.

The first operation is a success (op 3 out of 8), however on the second pass (history + current data in CSV format - op 7 out of 8) I'm getting the ParseError as shown below.

I've checked both files and there appears to be no separator conflict, data are in the correct columns etc.

Error tokenizing data. C error: Expected 50 fields in line 224599, saw 51

My code is as follows:

import pandas as pd
import os
import glob

def sremove(fn):
    os.remove(fn) if os.path.exists(fn) else None

def mergeit():
    df = pd.concat(pd.read_excel(fl) for fl in path1)
    df.to_csv(path2, index = False)

def mergeit2():
    df = pd.concat(pd.read_csv(fl) for fl in path1)
    df.to_csv(path2, index = False)


print("\n#Operation 3 - Incidents Dataset")
print("Incidents Dataset operation has started")
fn = "S:\\CPU CacheU Data\\201920\\Incidents_201920.csv"
sremove (fn)
print("Incidents 2019/20 file has been deleted - Operation 1 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Inc Dataset\Incidents Dataset*.xlsx')
print ("Path 1 - Incidents 2019/20 folder has been read successfully - Operation 2 of 8")
path2 = "S:\\CPU CacheU Data\\Incidents_201920.csv"
print ("Path 2 - Incidents 2019/20 Dataset File has been read successfully - Operation 3 of 8")
mergeit()
print ("Action has been completed successfully - Incidents Dataset 2019/20 Updated - Operation 4 of 8")
fn = "S:\\CPU CacheU Data\\Incidents_Dataset.csv"
sremove(fn)
print (" Incidents Dataset Old file has been deleted - Operation 5 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Incidents_*.csv')
print ("Path 1 - Incidents folder has been read successfully - Operation 6 of 8")
path2 = "S:\\CPU CacheU Data\\Incidents_Dataset.csv"
print ("Path 2 - Incidents Dataset File has been read successfully - Operation 7 of 8")
mergeit2()
print ("Path 2 - Incidents Dataset File has been updated successfully - Operation 8 of 8")

A couple of notes:

1) Op 3 out of 8 takes a really long time to run. I'm not sure if that's because of the xlsx to csv conversion.

2) I've tried to add the error_bad_lines = False statement in the def mergeit2() function but it seems to be taking forever to generate the master file.

Upvotes: 0

Views: 162

Answers (1)

user8560167
user8560167

Reputation:

check separators in your csv file, maybe there are more commas inside cells , read_csv is taking by default sep=',' Propably you should set different separator to open your csv file pd.read_csv(sep=' ')

Upvotes: 1

Related Questions