Reputation: 11
I have some Excel files with measurements taken from National Instruments' LabView. I'm trying to use Pandas to be able to edit the data but when using read_excel on those Excel files I get the error TypeError: expected <class 'openpyxl.styles.fills.Fill'>
.
The strange part is that if I open the file by hand and click save, without changing anything, read_excel is suddenly able to open the files. The amount of files is unfortunately too much for me to be able to resave by hand. Does anyone have any idea how to solve this problem? I've searched for this problem a lot and found nothing yet. Thanks!
Edit:
The code I'm using is the following.
import pandas as pd
import os
fname = 'C' # All the file I want to open start with C
fextension = '.xlsx'
directory = 'D:/TEST_Raw'
df_list = []
for filename in os.listdir(directory):
if fname in filename and filename.endswith(fextension):
df1 = pd.read_excel(directory + '/' + filename, header = 0, index_col = None, engine = 'openpyxl')
An example file is in this link. If I use this file the program will not run and give the error, but if I open and save the Excel it will run.
Upvotes: 1
Views: 1309
Reputation: 2670
Seems like the source file is corrupt to the point that a standard method of opening the file is not possible (e.g., pd.read_excel()
or pd.ExcelFile()
. If there are too many files to open manually and save...Try a non-standard way of opening the file.
One idea is using the code from: https://blog.adimian.com/2018/09/04/fast-xlsx-parsing-with-python/ (there may be better ways out there).
I tested the sample file using the code from blog.adimian.com (see the Full Code section right at the bottom of the page) and it seems to work. However, the column names are missing and need to be set manually. If the column names are all the same you could loop this for all the files.
Example output:
Upvotes: 1