Reputation: 1
I am trying to loop through an enourmous spreadsheet using various rules. First I would like to find all of the blank cells.
Currently I am trying something like this using pandas:
excel_file = <file name>
df = pd.read_excel(excel_file)
for item in df.items():
if item == "":
<add cell to list of blank cells>
Upvotes: 0
Views: 70
Reputation: 29
If you need to make a list of empty cells as Excel references try this code:
def ExcelNullCells(excel_file_path):
# important note this function assume the first row will be a header in excel file and your data start from first column in excel
import pandas as pd
import numpy as np
# making function to confert decimel column number to excel column referance
def DEC2Alpha(x):
import numpy as np
alpha="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
lst=[[0,r:=x%26,x:=x//26]]
n=0
while lst[n][2]>1:
lst.append([n:=lst[n][0]+1, r:=x%26,x:=x//26])
tr_arr= np.array(lst).T
last_remin=tr_arr[2][len(tr_arr[0])-1]
res_num= np.flip(np.append(tr_arr[1],last_remin))
res_num[len(res_num)-1]=res_num[len(res_num)-1]+1
if res_num[0]==0:
res_num=res_num[1:]
res_alpha="".join([alpha[num-1] for num in res_num])
return res_alpha
#reading excel file and convert it to array
path = excel_file_path
data = pd.read_excel(path)
data_np= np.array(data)
#checking Null value in array and return Null cells value as excel cell referance
a=pd.isna(data)
nalist=np.where(a)
excel_column=list(map(DEC2Alpha,nalist[1]))
excel_row=list(map(lambda row:row+2,nalist[0]))
excel_cell=list(map(lambda val :val[0]+str(val[1]), zip(excel_column, excel_row)))
return excel_cell
Upvotes: -1
Reputation: 51
If you want to use the dataframe, for finding empty cells you can use numpy.where. If you use numpy.where(df = '') you can get indices of rows and columns where the cells are empty:
data = {
'A': [1, 2, '', 4],
'B': [5, '', 7, 8],
'C': [9, 10, 11, '']
}
df = pd.DataFrame(data)
empty_string_indices = np.where(df == '')
empty_string_list = list(zip(empty_string_indices[0], empty_string_indices[1]))
print(empty_string_list)
And the output would be:
[(1, 1), (2, 0), (3, 2)]
This is the solution for empty strings specifically. If your empty cells actually contain NaN values, use np.where(pd.isnull(df)) instead
Upvotes: 4