Reputation: 2219
I have hundreds of excel files and I want to create a df that has the name of the excel file in one column and then the name of each tab in the second column. My script will iterate through each file name but the way I am appending is not right, and I'm not having much luck finding a solution.
os.chdir(r'C:\Users\mbobak\Documents\\')
FileList = glob.glob('*.xlsx')
tabs= pd.DataFrame(columns=['filename','tabs'])
for filename in FileList:
xl = pd.ExcelFile(filename).sheet_names
tabs= tabs.append([filename,xl])
desired output:
filename tabs
doc1.xlsx tab1
doc1.xlsx tab2
doc1.xlsx tab3
doc1.xlsx tab4
doc2.xlsx tab1
doc2.xlsx tab2
doc2.xlsx tab3
Upvotes: 2
Views: 3795
Reputation: 164673
Here is one way. It is good practice, and efficient, to create your dataframe in a single step. This is because appending to a list is cheaper than appending to a dataframe.
FileList = glob.glob('*.xlsx')
def return_files_tabs(FileList):
for filename in FileList:
for sheet in pd.ExcelFile(filename).sheet_names:
yield [filename, sheet]
df = pd.DataFrame(list(return_files_tabs(FileList)),
columns=['Filename', 'Tab'])
Explanation
sheet_names
attribute.Upvotes: 5
Reputation: 11613
I like @jpp's answer but if you don't want to use a generator this is how I would do it:
FileList = glob.glob('*.xlsx')
filenames = []
sheets = []
for filename in FileList:
for sheet in pd.ExcelFile(filename).sheet_names:
filenames.append(filename)
sheets.append(sheet)
df = pd.DataFrame(
{'Filename': filenames, 'Tab': sheets}
)
Upvotes: 3