Colton T
Colton T

Reputation: 328

pandas read excel values not formulas

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,

import pandas as pd

df = pd.read_excel(filename, sheetname="Sheet1")

This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?

Upvotes: 20

Views: 35823

Answers (4)

Rajat Tyagi
Rajat Tyagi

Reputation: 378

you can use xlrd to read the values. first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below file = myxl.xls

import xlrd
import win32com.client
import os

def refresh_file(file):
    xlapp = win32com.client.DispatchEx("Excel.Application")
    path = os.path.abspath(file)
    wb =  xlapp.Wordbooks.Open(path)
    wb.RefreshAll()
    xlapp.CalculateUntilAsyncqueriesDone()
    wb.Save()
    xlapp.Quit()

after the file refresh, you can start reading the content.

workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
    row = worksheet.row(rowid)
    for colid, cell in enumerate(row):
        print(cell.value)

you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

Upvotes: 0

Matteo Cavuoti
Matteo Cavuoti

Reputation: 1

I had the exact same problem, but i found out that (in my case) there was no problem.

I use the EXCEL VIEWER extension. Looking in the excel file with that extension it looks like the values are not there, instead the formula is there, if you click on the cell that says "#NAME?.

Anyway ... after doing pd.read_excel the dataframe works correctly.

Upvotes: -1

jeansergecardinal
jeansergecardinal

Reputation: 53

I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.

Upvotes: 0

RobatStats
RobatStats

Reputation: 445

That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.

In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).

In the second case, you could solve by setting explicit nan values in read_excel:

pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])

As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:

import pandas as pd
import xlwings as xl

def df_from_excel(path):
    app = xl.App(visible=False)
    book = app.books.open(path)
    book.save()
    app.kill()
    return pd.read_excel(path)

df = df_from_excel(path to your file)

If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.

Upvotes: 12

Related Questions