Reputation: 81
I have a large file in CSV,but the result turn to that Error tokenizing data.
import glob
import pandas as pd
path = "/Users/LAI/Downloads/learn/engagement_data"
all_files = glob.glob(path + "/*.csv")
print(all_files)
all_csv = [ ]
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, sep=',')
all_csv.append(df)
engagement_df = pd.concat(li, axis=0, ignore_index=True)
picture of all_files result
here is the result
Upvotes: 1
Views: 5444
Reputation: 611
There's probably an error in one of the CSV files you are reading.
Try using a print statement to figure out which file it is:
import glob
import pandas as pd
path = "/Users/LAI/Downloads/learn/engagement_data"
all_files = glob.glob(path + "/*.csv")
print(all_files)
all_csv = [ ]
for filename in all_files:
try:
df = pd.read_csv(filename, index_col=None, header=0, sep=',')
all_csv.append(df)
except Exception as e:
print(f"Problem file: {filename} caused Exception: {e}")
raise
engagement_df = pd.concat(li, axis=0, ignore_index=True)
Alternatively you can try changing the parser "engine" to the Python engine (as documented in this blog):
import glob
import pandas as pd
path = "/Users/LAI/Downloads/learn/engagement_data"
all_files = glob.glob(path + "/*.csv")
print(all_files)
all_csv = [ ]
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, sep=',', engine='python')
all_csv.append(df)
engagement_df = pd.concat(li, axis=0, ignore_index=True)
But it would be better practice to find the problematic CSV file and fix it. You could also combine the two solutions with something like:
import glob
import pandas as pd
path = "/Users/LAI/Downloads/learn/engagement_data"
all_files = glob.glob(path + "/*.csv")
print(all_files)
all_csv = [ ]
for filename in all_files:
try:
df = pd.read_csv(filename, index_col=None, header=0, sep=',')
all_csv.append(df)
except pd.errors.ParserError as e:
df = pd.read_csv(filename, index_col=None, header=0, sep=',', engine='python')
all_csv.append(df)
print(f"Problem file: {filename} caused Exception: {e}")
pass
engagement_df = pd.concat(li, axis=0, ignore_index=True)
Or simply skip that file if it's OK to be missing data:
import glob
import pandas as pd
path = "/Users/LAI/Downloads/learn/engagement_data"
all_files = glob.glob(path + "/*.csv")
print(all_files)
all_csv = [ ]
for filename in all_files:
try:
df = pd.read_csv(filename, index_col=None, header=0, sep=',')
all_csv.append(df)
except pd.errors.ParserError as e:
print(f"Problem file: {filename} caused Exception: {e}")
pass
engagement_df = pd.concat(li, axis=0, ignore_index=True)
Upvotes: 1