Reputation: 521
Is there a way to force Pandas to read every single column/row from excel file cell by cell into Dataframe?
I've got some very funky spreadsheets with lots of weird formatting - merged cells, missed lines, etc. The parser goes bananas trying to read that.
Is it possible to force pandas to create n rows by m columns dataframe which content would correspond directly to the content of excel sheet subset with the same dimensions starting at "A1" cell?
One option is to save it as .csv in excel(writing a vba script to do it more multiple files) and read into pandas as .csv but i was looking for easier solution.
Upvotes: 2
Views: 393
Reputation: 3583
I realize this question is really old. Maybe this will still be useful to somebody googleing it
One way is to extract the information using openpyxl yourself like so
import openpyxl
import numpy as np
import pandas as pd
wb = openpyxl.load_workbook(filename="path/my_file.xlsx")
ws = wb.worksheets.pop()
n = len(list(ws.iter_cols()))
m = len(list(ws.iter_rows()))
data = np.empty([m,n], dtype=object)
for j,col in enumerate(ws.iter_cols()):
for i,cell in enumerate(col):
data[i,j] = cell.value
df = pd.DataFrame(data)
Upvotes: 0