davida28
davida28

Reputation: 31

Convert xls to xlsx with excess headers

junior dev here.

Goal: Using python, convert file type from xls to xlsx that contains a clean header.

Desired header: enter image description here

My attempt:

My first attempt was to use win32com. However, that didn't work because I received the following two errors when pip installing. I believe it's because I'm on a Mac.

ERROR: Could not find a version that satisfies the requirement win32com (from versions: none)

ERROR: No matching distribution found for win32com

I then followed this post that doesn't use win32com, however, that produced this error.

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<?xml ve'

The other issue I'm running into is my file itself. At the top, there are 6 extra lines that need to be removed. In addition, my headers of the actual data table have a mix of merged and unmerged cells. I'm not certain how to go about fixing that.

Any suggestions would be helpful and thank you in advance!

enter image description here

Upvotes: 1

Views: 206

Answers (1)

davida28
davida28

Reputation: 31

Answering the second part of my question. Still not certain on how to take in xls files.

If I convert the file to a CSV file, then use this command to remove the top few lines. skiprows is the method to use that cuts out the top section of a csv or xlsx file during the df's intialization.

df = pd.read_csv('file_name.csv', skiprows = 8)

Upvotes: 0

Related Questions