Klatten
Klatten

Reputation: 312

Python: Keep leading zeroes when converting from excel to CSV with pandas

I have an excel sheet that is to be inserted into a database. I wrote a python script, which takes an excel file, converts it into a CSV and then inserts it to the database. The problem is that the excel sheet contains zipcodes, which unfortunately removes the leading zeroes.

Here is my code that reads the excel sheet and puts it into a csv:

def excel_to_csv():
    xlsx = pd.read_excel(excel_path + fileName + '.xlsx')
    xlsx.to_csv(csv_file, encoding='utf-8', index=False, na_rep=None, quoting=csv.QUOTE_NONE)


excel_to_csv()

And then I use this code to insert it into the database:

with open(csv_file, 'rb') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    next(reader)
    for row in reader:
        cur.execute(
            "INSERT INTO table (foo1, foo2, zipcode, foo3) VALUES (%s, %s, %s, %s); ",
            row
        )

conn.commit()

When I print out my csv after its converted from excel, I get this result:

foo1,foo2,zipcode,foo3
353453452,DATA,37,CITY
463464356,DATA,2364,CITY

The zipcode cell in the excel file is converted into text so it keeps the leading zeroes, but how can I keep the leading zeroes when I convert the excel file into csv?

Upvotes: 4

Views: 9590

Answers (1)

SpghttCd
SpghttCd

Reputation: 10860

From the docs:

dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
New in version 0.20.0.

So you can tell pd.read_excel to not interpret the data by setting the dtype-kwarg to object:

xlsx = pd.read_excel(excel_path + fileName + '.xlsx', dtype='object')

Upvotes: 12

Related Questions