Reputation: 187
I'm trying to combine excel data files with different dates to one file so I can do analysis using pandas package. I am having difficulties since the files are named by dates and have multiple sheets inside.
This is for an assignment to analyze the date and plot various parameters i.e, temp, atm, GHI e.t.c to the number of days/hours/minutes
import pandas as pd
import glob
all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
Upvotes: 2
Views: 1155
Reputation: 8521
Can you try the following:
import os
all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
df = pd.ExcelFile(f).parse('Sheet1', skiprows=16)
file_date = os.path.splitext(os.path.basename(f))[0].split('_')[1]
df['file_date'] = pd.to_datetime(file_date)
all_data = pd.concat([all_data, df])
all_data = all_data.set_index('file_date').sort_index()
Upvotes: 2
Reputation: 18647
Append each file DataFrame
to a list, then use pandas.concat
to combine them all to one DataFrame
:
import pandas as pd
import glob
frames = []
for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
df = pd.read_excel(f).assign(file_name=f)
# Add date column for sorting later
df['date'] = pd.to_datetime(df.file_name.str.extract(r'(\d{4}-\d{2}-\d{2})', expand=False), errors='coerce')
frames.append(df)
all_data = pd.concat(frames, ignore_index=True).sort_values('date')
Upvotes: 2