flipper
flipper

Reputation: 521

Parsing excel files with Pandas

Is there a way to force Pandas to read every single column/row from excel file cell by cell into Dataframe?

I've got some very funky spreadsheets with lots of weird formatting - merged cells, missed lines, etc. The parser goes bananas trying to read that.

Is it possible to force pandas to create n rows by m columns dataframe which content would correspond directly to the content of excel sheet subset with the same dimensions starting at "A1" cell?

One option is to save it as .csv in excel(writing a vba script to do it more multiple files) and read into pandas as .csv but i was looking for easier solution.

Upvotes: 2

Views: 393

Answers (1)

Lukas S
Lukas S

Reputation: 3583

I realize this question is really old. Maybe this will still be useful to somebody googleing it

One way is to extract the information using openpyxl yourself like so

import openpyxl
import numpy as np
import pandas as pd

wb = openpyxl.load_workbook(filename="path/my_file.xlsx")

ws = wb.worksheets.pop()

n = len(list(ws.iter_cols()))
m = len(list(ws.iter_rows()))
data = np.empty([m,n], dtype=object)
for j,col in enumerate(ws.iter_cols()):
    for i,cell in enumerate(col):
        data[i,j] = cell.value
df = pd.DataFrame(data)

Upvotes: 0

Related Questions