Tonikami04
Tonikami04

Reputation: 187

Combining data from multiple excel files in python using pandas package

I'm trying to combine excel data files with different dates to one file so I can do analysis using pandas package. I am having difficulties since the files are named by dates and have multiple sheets inside.

This is how the inside of the file looks like This is how the inside of the month folder looks like The inside of the year folder with multiples directoties

This is for an assignment to analyze the date and plot various parameters i.e, temp, atm, GHI e.t.c to the number of days/hours/minutes

import pandas as pd
import glob

all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)


Upvotes: 2

Views: 1155

Answers (2)

Jeril
Jeril

Reputation: 8521

Can you try the following:

import os
all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.ExcelFile(f).parse('Sheet1', skiprows=16)
    file_date = os.path.splitext(os.path.basename(f))[0].split('_')[1]
    df['file_date'] = pd.to_datetime(file_date)
    all_data = pd.concat([all_data, df])
all_data  = all_data.set_index('file_date').sort_index()

Upvotes: 2

Chris Adams
Chris Adams

Reputation: 18647

Append each file DataFrame to a list, then use pandas.concat to combine them all to one DataFrame:

import pandas as pd
import glob

frames = []

for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.read_excel(f).assign(file_name=f)
    # Add date column for sorting later
    df['date'] = pd.to_datetime(df.file_name.str.extract(r'(\d{4}-\d{2}-\d{2})', expand=False), errors='coerce')
    frames.append(df)

all_data = pd.concat(frames, ignore_index=True).sort_values('date')

Upvotes: 2

Related Questions