How to read several xlsx-files in a folder into a pandas dataframe

I have a folder. In this folder are 48 xlsx files, but the count of the relevant files are 22. Them name of these 22 files have no structure, the only thing in common is that the filenames start with data. I would love to access this files and read them all into a dataframe. Doing this manually with the code line

df = pd.read_excel(filename, engine='openpyxl')

takes too long

The table structure is similar but not always exactly the same. How can I manage to solve this problem

Upvotes: 1

Views: 304

Answers (1)

Bilal Qandeel
Bilal Qandeel

Reputation: 727

import os
import pandas as pd

dfs = {}

def get_files(extension, location):
    xlsx_list = []

    for root, dirs, files in os.walk(location):
        for t in files:
            if t.endswith(extension):   
                xlsx_list.append(t)
    return xlsx_list

file_list = get_files('.xlsx', '.')
index = 0
for filename in file_list:
    index += 1
    df = pd.read_excel(filename, engine='openpyxl')
    dfs[filename] = df

print(dfs)

each element in dfs like dfs['file_name_here.xlsx'] accesses the data frame output from the read_excel.

EDIT: that you can add additional criteria to filter through the XLSX files at the line if t.endswith(extension): you can check out the beginning of the file like if t.startswith('data'): too. Maybe combine them if t.startswith('data') and t.endswith(extension):

Upvotes: 2

Related Questions