Tigerfire176
Tigerfire176

Reputation: 5

Extracting CSV files from multiple .xlxs files with multiple worksheets

For the purpose of writing my thesis mechanical engineering i recieved a lot of sensor data that is configured in multiple excel files(100) with multiple sheets(22). Now I want to visualize this into power Bi but the .xlxs files a working way to slow so i want all the data(sheets) in seperate CSV files. I don't have any real experience with programming but will be able to run a script in jupyter or spyder.

I tried a code in VBA that configured multiple excels into csv, but this only worked for the first sheet in the .xlsx file.

I also used the code below in jupyter notebook; but this gives me all the sheets for one single excel.

data = pd.read_excel('file_name.file_format', sheet_name=None)

for sheet_name, df in data.items():
    df.to_csv(f'{sheet_name}.csv')

Does anyone have a code for this purpose or does someone know how to adjust the code above to do this with all excel files in a folder?

Upvotes: 0

Views: 331

Answers (3)

Tigerfire176
Tigerfire176

Reputation: 5

Some files unfortunatly have extra sensors & data which means extra workssheets. But i got this code working now:

import os
import pandas as pd

directory = "./"
files = os.listdir(directory)
for xlxs_file in files:
    if ".xlsx" in xlxs_file:
        filename = xlxs_file.strip(".xlsx")
        xlxs_file = os.path.join(directory, xlxs_file)
        data = pd.read_excel(xlxs_file, sheet_name=None)
        for sheet_name, df in data.items():
            df.to_csv("{}-{}.csv".format(filename, sheet_name))

Upvotes: 0

Matthew King
Matthew King

Reputation: 644

You could try looping through the directory for each file that has .xlxs, just replace YOUR_DIR with your own path to folder containing the files.

I've added "filename" which is just filename without extension so you can add this to .csv filename

import os

directory = "\\YOUR_DIR\\HERE"
files = os.listdir(directory)

for xlxs_file in files:
    if ".xlxs" in xlxs_file:

        filename = xlxs_file.strip(".xlxs")
        xlxs_file = directory + "\\" + xlxs_file

        data = pd.read_excel(xlxs_file, sheet_name=None)
        for sheet_name, df in data.items():
            df.to_csv(f'{filename}_{sheet_name}.csv')

Upvotes: 0

Paul Wilson
Paul Wilson

Reputation: 560

So long as the sheetnames are the same in each file then this should work:

import os
import pandas as pd

# target directory where the workbooks lie
tgt_dir = r'paste\directory\here\make\sure\to\keep\letter\r\before\quote'

# list of any files within the dir that have .xlsx in them
list_xl_files = [f for f in os.listdir(tgt_dir) if '.xlsx' in f.lower()]

# type a list of the sheets you want to target and extract
list_target_sheets = ['Sheet1', 'Sheet2', 'etc']

# iterate through each file and for each sheet in target sheets
for xl_file in list_xl_files:
    for sheet in list_target_sheets:
        
        # read in the file and target sheet
        df = pd.read_excel(tgt_dir+'\\'+xl_file, sheet_name=sheet)
        
        # export to csv but replace .xlsx with nothing 
        # then add _sheetname.csv so the filename shows the sheet too
        df.to_csv(tgt_dir+'\\'+xl_file.replace('.xlsx','')+'_'+sheet_name+'.csv')

Upvotes: 1

Related Questions