Ines
Ines

Reputation: 128

How can I select the calculated cells from excels with pandas dataframe condition?

I am a novice programmer and I'm learning python on my own.

I read the excel

df = pd.read_excel(file, sheet_name=sheet_name, header=3)

Now I would like to know which cells of the excel are calculated with a formula to replace them with a condition.

The calculated cell will have a value according to the column that is

For example,

if df[column1] (have formula_excel in cell):
    this.cell = new value

I hope I have explained, thank you !!

Upvotes: 0

Views: 135

Answers (1)

MDR
MDR

Reputation: 2670

Maybe not the perfect answer, but openpyxl may be able to shine some light on what formulae is in an Excel file (before putting it into a dataframe.

As one example (shown below) you could pre-process the Excel file to show the formulae and then get it into a dataframe.

Given:

A spreadsheet that looks like (normal view and showing formulae)...

enter image description here

Code:

import pandas as pd
import numpy as np
from openpyxl import load_workbook
# from openpyxl.utils import FORMULAE

df = pd.read_excel('calcs.xlsx')

print(df)
print('\n\n')
'''Shows (Col3 is all formulae but it's not showing):

   Col1  Col2         Col3
0   858     3  1368.333333
1   651     1  4105.000000
2   448     5   821.000000
3   210     4  1026.250000
4   301     6   684.166667
5   671     2  2052.500000
6   420     5   821.000000
7   397     1  4105.000000
8   149     5   821.000000

'''

# Note: 'wb' and 'ws' are shorthand for workbook and worksheet
wb = load_workbook(filename = 'calcs.xlsx')
ws = wb['Sheet1']

# print Col3 (aka column C)
for cell in ws['C']:
   print(cell.value)

print('\n\n')

'''Shows:

Col3
=SUM(A$2:A$10)/B2
=SUM(A$2:A$10)/B3
=SUM(A$2:A$10)/B4
=SUM(A$2:A$10)/B5
=SUM(A$2:A$10)/B6
=SUM(A$2:A$10)/B7
=SUM(A$2:A$10)/B8
=SUM(A$2:A$10)/B9
=SUM(A$2:A$10)/B10

'''

# add a single quote to the start of all cells in Col3 (aka column C)
for cell in ws['C']:
   cell.value = "'" + str(cell.value)

# create a dataframe with the updated values (showing the forumulae in Col3)
df = pd.DataFrame(ws.values)
df.columns = df.iloc[0]
df = df[1:]

print(df)

'''Shows:

0 Col1 Col2                'Col3
1  858    3   '=SUM(A$2:A$10)/B2
2  651    1   '=SUM(A$2:A$10)/B3
3  448    5   '=SUM(A$2:A$10)/B4
4  210    4   '=SUM(A$2:A$10)/B5
5  301    6   '=SUM(A$2:A$10)/B6
6  671    2   '=SUM(A$2:A$10)/B7
7  420    5   '=SUM(A$2:A$10)/B8
8  397    1   '=SUM(A$2:A$10)/B9
9  149    5  '=SUM(A$2:A$10)/B10

'''

print('\n\n')

So you now have a dataframe with the formulae showing in Col3 and could further process the frame in Pandas to do if this, then etc. and eventually delete the original Col3 once you get what you need.

Upvotes: 1

Related Questions