Reputation: 138

How to convert my xlsx files into CSV in bulk via python

I am trying to convert a list of xlsx files into csv format. at the moment i have been able to do this using xlrd and csv but file by file using the below code:

import xlrd
import csv

def csv_from_excel():
    wb = xlrd.open_workbook(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx")
    sh = wb.sheet_by_name('RM')
    csv_file = open('RM 2018-07-30.csv', 'w', newline='')
    wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    csv_file.close()

csv_from_excel()


import pandas as pd 
import numpy as np 

df = pd.read_csv('RM 2018-07-30.csv', index_col= 0, encoding = 'iso-8859-1')

df2 = df[['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']]

df2.to_csv(r"C:\Users\jonathon.kindred\Desktop\RM\2018\JUL 2018\RM 2018-07-30.csv", index = False)

i need to be able to do this folder by folder rather than file by file. I've managed to get a list of the next folder using glob, see below:

import glob

path = r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\AUG 2018"

files = [f for f in glob.glob(path + "**/*.xlsx", recursive=True)]

for f in files:
    print(f)

The issue is that im finding it hard to combine both scripts so that it follows these steps:

converts the .xlsx into csv
selects only these columns 'Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'
places it into my other folder.

The two folders are; the xlsx location: RM - USE and destination location: RM.

Upvotes: 1

Answers (3)

O Yahya

Reputation: 376

Use os.listdir() to get a list of all files in specific folder

put csv_from_excel() function inside a for loop to iterate through each file in the list

path = "PATH/TO/FOLDER" 
list = os.listdir(path)
for file in list:
    fileName = str(file)
    def csv_from_excel():
        wb = xlrd.open_workbook(fileName)
        sh = wb.sheet_by_name('RM')
        csv_file = open('RM 2018-07-30.csv', 'w', newline='')
        wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)

        for rownum in range(sh.nrows):
            wr.writerow(sh.row_values(rownum))

    csv_file.close()

csv_from_excel()

Update: To select multiple columns in CSV file, use pandas to store the columns content into a pandas data frame, then you can save the data frame as a CSV to a new folder

import pandas
#Store CSV columns into a pandas data frame
colNames = ['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']
data = pandas.read_csv(fileName, names=colNames)
#Extract the CSV columns to a new CSV
df = pandas.DataFrame(data, columns = colNames)
df.to_csv('PATH/TO/NEW/CSV', index=False)

Upvotes: 2

Pablo de Luis

Reputation: 111

You can just read your .xlsx files, select your wanted columns and export to csv only using pandas

df = pd.read_excel(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx", sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])
df.to_csv('RM 2018-07-30.csv', index=False)

If you are iterating over files and directories, glob it's perfect for it. You can also check Pathlib library, can help you to get only directories or files, extensions or combine them to get your export paths.

For example if you have your list of files using glob:

from pathlib import Path

files = glob(files_path)
output_dir = Path('destination')
for file in files:
    df = pd.read_excel(file, sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])

    output_file = Path(file)
    output_file = output_dir / '{}.csv'.format(output_file.stem)

    df.to_csv(output_file, index=False)

Upvotes: 1

rajvi

Reputation: 150

need to install: $ pip install rows openpyxl

import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))

Upvotes: 0

How to convert my xlsx files into CSV in bulk via python

Answers (3)

Related Questions