Reputation: 89
I am trying to combine multiple sheets from different excel files into one dataframe. All the files have multiple sheets, and one of these sheets has the same name in all the files - this is the sheet that I am interested in combining into one dataframe. All the files are in the same directory.
import pandas as pd
import os, glob
os.chdir(r'c:\Users\Documents\Files')
def files(): #to select the files that have RMP and WE on their name
list_files= pd.Series()
for file in glob.glob('RMP*WE*'):
data= pd.Series(file)
list_files= list_files.append(data, ignore_index=True)
return list_files
a= files()
print("This is the variable a\n", a)
def extract_tab(): #to concatenate the sheet called Metrics that all files have
frame_files= pd.DataFrame()
try:
for file in a:
data= pd.read_excel(file,sheet_name='Metrics')
frame_files= frame_files.append(data, ignore_index=True)
except:
pass
return frame_files
b= extract_tab()
print("This is b\n",b)
The result of variable a (files function) is a list of the files that meet the naming criteria. But the result of variable b (extract_tab function) is an empty dataframe. What am I doing wrong?
I looked at this postImport multiple excel files into python pandas and concatenate them into one dataframe but it doesn't work... although I did take a couple of ideas from it.
Upvotes: 2
Views: 1986
Reputation: 23099
IIUC, you can just do a list comp over your directory.
If using Python 3.4 +
from Pathlib import Path
path_ = 'c:\Users\Documents\Files'
dfs = [pd.read_excel(f,sheet_name='metrics') for f in Path(path_).glob('RMP*WE*')]
df = pd.concat(dfs)
or if you can only use the os
module :
os.chdir('c:\Users\Documents\Files')
files = glob.glob('RMP*WE*')
dfs = [pd.read_excel(f,sheet_name='metrics') for f in files]
df = pd.concat(dfs)
If you need to handle missing sheets, this would be a nice way to do so.
def exlude_sheet(excel_list, sheet):
"""
takes two arguments:
1. A list of excel documents
2. The name of your sheet.
3. Returns a single data frame after
working through your list of excel objects.
"""
from xlrd import XLRDError
df_lists = []
for file in excel_list:
try:
file_df = pd.read_excel(file, sheet_name=sheet)
df_lists.append(file_df)
except (XLRDError) as e:
print(f"{e} skipping")
continue
try:
return pd.concat(df_lists)
except ValueError as err:
print("No Objects Matched")
xlsx = [f for f in Path(path_).glob('RMP*WE*')]
df = exlude_sheet(xlsx,sheet='Metrics')
out:
No sheet named <'Metrics'> for doc_1 skipping
No sheet named <'Metrics'> for doc_final skipping
print(df)
Column_A data
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
Testing when no matching sheets are found at all :
exlude_sheet(xlsx,'foobar')
No sheet named <'foobar'> skipping
No sheet named <'foobar'> skipping
No sheet named <'foobar'> skipping
No Objects Matched
Upvotes: 4
Reputation: 14094
Try
import pandas as pd
from glob import glob
os.chdir('c:\Users\Documents\Files')
df = pd.concat([pd.read_excel(i, sheet_name='Metrics') for i in glob('RMP*WE*')])
Upvotes: 1