Reputation: 31
junior dev here.
Goal: Using python, convert file type from xls to xlsx that contains a clean header.
My attempt:
My first attempt was to use win32com. However, that didn't work because I received the following two errors when pip installing. I believe it's because I'm on a Mac.
ERROR: Could not find a version that satisfies the requirement win32com (from versions: none)
ERROR: No matching distribution found for win32com
I then followed this post that doesn't use win32com, however, that produced this error.
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<?xml ve'
The other issue I'm running into is my file itself. At the top, there are 6 extra lines that need to be removed. In addition, my headers of the actual data table have a mix of merged and unmerged cells. I'm not certain how to go about fixing that.
Any suggestions would be helpful and thank you in advance!
Upvotes: 1
Views: 206
Reputation: 31
Answering the second part of my question. Still not certain on how to take in xls files.
If I convert the file to a CSV file, then use this command to remove the top few lines. skiprows is the method to use that cuts out the top section of a csv or xlsx file during the df's intialization.
df = pd.read_csv('file_name.csv', skiprows = 8)
Upvotes: 0