LinnK
LinnK

Reputation: 395

Loop through Dataframes in Dict of Dataframes

I have an excel file with multiple sheets that I'm reading into Python using Pandas. I'm currently using the command

raw_data=pd.read_excel(...) 

to read in data from the file (into a DataFrame), after which I'm performing a number of calculations to the DataFrame. See the code below:

raw_data = pd.read_excel(r'/Users/linnk/Desktop/Results/01_05_2012 Raw Results.xls', skiprows=1, header=0, nrows=1440)   
raw_data[u'Time']= pd.to_datetime(raw_data[u'Time'], unit='d')   # Converts first column to datetime, to make averaging easier                                                           
raw_data.set_index(pd.DatetimeIndex(raw_data[u'Time']), inplace=True)
ave_data = raw_data.resample('h', how='mean')

raw_data.Time = pd.to_datetime(raw_data.Time)
def time_cat(t):
    hour = t.hour
    if(hour >= 5 and hour < 9):
        return 'Morning (5AM-9AM)'
    elif(hour >= 9 and hour < 18):
        return 'Day (9AM-6PM)'
    elif(hour >= 18 and hour < 22):
        return 'Evening (6PM-10PM)'
    else:
        return 'Night (10PM-5AM)'
ave_by_timeofday = raw_data.groupby(raw_data.Time.apply(time_cat)).mean()

For information, the names of the sheets in the Excel file are not 'Sheet1', ' Sheet2' ect. and will change as I run this code over other Excel files (which I do need to do). All the sheets contain similar data: a date-and-time index column, a column name and then numerical data (positive and negative). The different sheets might have different numbers of columns.

What I really need to do is perform the above calculations/Dataframe manipulations to all the sheets in my Excel file. As far as I understand this means I need to change

pd.read_excel() 

to

pd.read_excel(..., sheetname= 0)

This, however turns the raw_data variable into a dict of DataFrame's, instead of just a DataFrame. Can someone help me create some code that will loop through this Dict of DataFrame's, and perform the calculations/manipulation shown in the code above for each DataFrame/Sheet? I unfortunately haven't found any resources online that help me understand this.

It would also be great is someone could help me figure out a way to export each of these manipulated DataFrame's to SQL. I'm having a hard time thinking of a way to do that would be without overwriting previous data.

Let me know if you need me to post additional information such as Excel file data or current outputs from my code. Thank you!

Upvotes: 0

Views: 1427

Answers (1)

Woody Pride
Woody Pride

Reputation: 13955

I think it would probably be better to create an ExcelFile object. I cannot solve your actual problem as there is no example data, but in general my strategy is as follows:

EF = pd.ExcelFile('path/to/file.xlsx')

The ExcelFile object has an attribute sheet_names that returns a list with all the sheets in the ExcelFile object. Each of these sheets can be separately parsed using parse. Each sheet is then returned as a data frame. So you can iterate as follows:

for sheet_name in EF.sheet_names:
    temp_df = EF.parse(sheet_name)
    #perform your calculations 
    #export to SQL

Hope this is of some use...

Upvotes: 1

Related Questions