goidelg
goidelg

Reputation: 326

Saving XLSX workbooks as multiple CSV files

Trying to save Excel files with multiple sheets as corresponding CSV files. I tried the following method:

import xlrd
from openpyxl import Workbook, load_workbook
import pathlib
import shutil
import pandas as pd

def strip_xlsx(inputdir, file_name, targetdir):
    wb = load_workbook(inputdir)
    sheets = wb.sheetnames
    for s in sheets:
        temp_df = pd.read_excel(inputdir, sheet_name=s)
        temp_df.to_csv(targetdir + "/" + file_name.strip(".xlsx") + "_" + s + ".csv", encoding='utf-8-sig')

Where inputdir is an absolute path to a the Excel file (say: "/Users/me/test/t.xlsx"), file_name is just the name of the file ("t.xlsx") and target_dir is a path to which I wish to save the csv files.

The methods works well, thought super slow. I'm a newbie to Python and feel like I implemented the method in a very inefficient way.

Would appreciate tips from the masters.

Upvotes: 0

Views: 254

Answers (2)

bherbruck
bherbruck

Reputation: 2226

You may have better luck if you keep everything in pandas. I see you are using openpyxl to get the sheet names, you can do this in pandas. As for speed, you'll just have to see:

EDIT:

As Charlie (the person who probably knows the most about openpyxl on the planet) pointed out, using just openpyxl will be faster. In this case about 25% faster (9.29 ms -> 6.87 ms for my two-sheet test):

from os import path, mkdir
from openpyxl import load_workbook
import csv

def xlsx_to_multi_csv(xlsx_path: str, out_dir: str = '.') -> None:
    """Write each sheet of an Excel file to a csv
    """
    # make the out directory if it does not exist (this is not EAFP)
    if not path.exists(out_dir):
        mkdir(out_dir)
    # set the prefix
    prefix = path.splitext(xlsx_path)[0]
    # load the workbook
    wb = load_workbook(xlsx_path, read_only=True)
    for sheet_name in wb.sheetnames:
        # generate the out path
        out_path = path.join(out_dir, f'{prefix}_{sheet_name}.csv')
        # open that file
        with open(out_path, 'w', newline='') as file:
            # create the writer
            writer = csv.writer(file)
            # get the sheet
            sheet = wb[sheet_name]
            for row in sheet.rows:
                # write each row to the csv
                writer.writerow([cell.value for cell in row])

xlsx_to_multi_csv('data.xlsx')

Upvotes: 2

Boskosnitch
Boskosnitch

Reputation: 774

You just need to specify a path to save the csv's to, and iterate through a dictionary created by pandas to save the frames to the directory.

csv_path = '\path\to\dir'
for name,df in pd.read_excel('xl_path',sheet_name=None).items():
    df.to_excel(os.path.join(csv_path,name)

Upvotes: 1

Related Questions