Reputation: 135
I have an excel spreedsheet with multiple workbooks, and each workbook has the following attributes:
data = pd.DataFrame({
'start date': ['1988/01/12', '1988/01/13', '1988/01/14', '1989/01/20','1990/01/01'],
'end date': ['1989/01/12', '1989/01/12', '1989/01/12', '1990/01/20','1991/01/01'],
'value': [11558522, 12323552, NaN, 18412280, NaN]
})
I want to loop through each sheet in the workbook and:
I want a summary spreedsheet/dataframe where iIhave the sheetname(id), start date, end date and the number of missing values like this:
data = pd.DataFrame({
sheet name': ['sheet 1', 'sheet 2', 'sheet 3', 'sheet 4','sheet5'],
'start date': ['1988/01/12', '1988/01/13', '1988/01/14', '1989/01/20','1990/01/01'],
'end date': ['1989/01/12', '1989/01/12', '1989/01/12', '1990/01/20','1991/01/01'],
'missing values': [3, 10, 1, 43, 8]
})
Upvotes: 2
Views: 700
Reputation: 862601
Use sheet_name
in read_excel
for all sheets to orderdict of DataFrames, convert columns to datetimes and get minimal and maximal and count number of misisng values by sum
of mask created by Series.isna
, last append to list and last create new DataFrame
by constructor:
L = []
for s, df in pd.read_excel('test.xlsx', sheet_name=None).items():
d = {'sheet name': s,
'start date': pd.to_datetime(df['start date']).min(),
'end date': pd.to_datetime(df['end date']).max(),
'missing values':df['value'].isna().sum()}
L.append(d)
df1 = pd.DataFrame(L)
Upvotes: 1