Reputation: 1100
I got csv dataset like this and i need to remove any empty rows inside of it i already tried following code but still it fails when it reads empty row, and return
pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 11, saw 8
def feed_db():
try:
employees = pd.read_csv(
'employee.csv', delimiter=',', sep='\t', encoding="utf-8", header=None, skipinitialspace=True, skip_blank_lines=True)
employees.columns = [c.strip().lower().replace(' ', '_')
for c in employees.columns]
employees.to_sql('employees', conn, if_exists='replace', index=False)
except Error as exc:
raise Error('Database initialization failed', exc)
sample dataset
Employee Id, Full Name, Gender, Date of Birth, Joined Date, Salary (USD), Branch
EN_0001, Aditi Musunur, Male, 1990-03-24, 2011-07-05, 1500, Sri Lanka
EN_0002, Advitiya Sujeet, Male, 1986-07-28, 2010-03-24, 1600, Sri Lanka
EN_0003, Alagesan Poduri, Male, 1982-05-25, 2016-06-24, 1800, Sri Lanka
EN_0004, Amrish Ilyas, Female, 1987-10-24, 2013-12-17, 2000, India
EN_0005, Aprativirya Seshan, Female, 1981-12-16, 2012-03-14, 1750, India
EN_0006, Asvathama Ponnada, Male, 1986-01-09, 2014-06-18, 2300, Pakistan
EN_0007, Avantas Ghosal, Female, 1981-10-05, 2016-08-26, 4200, Pakistan
EN_0008, Avidosa Vaisakhi, Male, 1980-08-09, 2018-03-05, 3100, Bangladesh
EN_0009, Barsati Sandipa, Male, 1988-04-09, 2011-05-03, 2925, Bangladesh
EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh
EN_0011, Debas Sundhar, Female, 1990-03-26, 2015-05-18, 2800, Bangladesh
how can i remove those empty rows using pandas ?
Upvotes: 0
Views: 488
Reputation: 9619
As mentioned, there is a redundant empty cell in line 11. You can open and fix the csv
, write it to StringIO
, then open it with pd.read_csv
:
import pandas as pd
import io
s = io.StringIO()
with open('employee.csv') as file:
for line in file:
s.write(str(line).replace(", ,", ","))
s.seek(0)
df = pd.read_csv(s)
result:
Employee Id | Full Name | Gender | Date of Birth | Joined Date | Salary (USD) | Branch | |
---|---|---|---|---|---|---|---|
0 | EN_0001 | Aditi Musunur | Male | 1990-03-24 | 2011-07-05 | 1500 | Sri Lanka |
1 | EN_0002 | Advitiya Sujeet | Male | 1986-07-28 | 2010-03-24 | 1600 | Sri Lanka |
2 | EN_0003 | Alagesan Poduri | Male | 1982-05-25 | 2016-06-24 | 1800 | Sri Lanka |
3 | EN_0004 | Amrish Ilyas | Female | 1987-10-24 | 2013-12-17 | 2000 | India |
4 | EN_0005 | Aprativirya Seshan | Female | 1981-12-16 | 2012-03-14 | 1750 | India |
5 | EN_0006 | Asvathama Ponnada | Male | 1986-01-09 | 2014-06-18 | 2300 | Pakistan |
6 | EN_0007 | Avantas Ghosal | Female | 1981-10-05 | 2016-08-26 | 4200 | Pakistan |
7 | EN_0008 | Avidosa Vaisakhi | Male | 1980-08-09 | 2018-03-05 | 3100 | Bangladesh |
8 | EN_0009 | Barsati Sandipa | Male | 1988-04-09 | 2011-05-03 | 2925 | Bangladesh |
9 | EN_0010 | Debasis Sundhararajan | Female | 1990-03-26 | 2015-05-18 | 2800 | Bangladesh |
10 | EN_0011 | Debas Sundhar | Female | 1990-03-26 | 2015-05-18 | 2800 | Bangladesh |
Upvotes: 1
Reputation: 103
Your csv file has an extra comma on this line:
EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh
Upvotes: 0