Reputation: 138
I am trying to convert a list of xlsx files into csv format. at the moment i have been able to do this using xlrd and csv but file by file using the below code:
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx")
sh = wb.sheet_by_name('RM')
csv_file = open('RM 2018-07-30.csv', 'w', newline='')
wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
csv_file.close()
csv_from_excel()
import pandas as pd
import numpy as np
df = pd.read_csv('RM 2018-07-30.csv', index_col= 0, encoding = 'iso-8859-1')
df2 = df[['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']]
df2.to_csv(r"C:\Users\jonathon.kindred\Desktop\RM\2018\JUL 2018\RM 2018-07-30.csv", index = False)
i need to be able to do this folder by folder rather than file by file. I've managed to get a list of the next folder using glob, see below:
import glob
path = r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\AUG 2018"
files = [f for f in glob.glob(path + "**/*.xlsx", recursive=True)]
for f in files:
print(f)
The issue is that im finding it hard to combine both scripts so that it follows these steps:
The two folders are; the xlsx location: RM - USE and destination location: RM.
Upvotes: 1
Views: 283
Reputation: 376
Use os.listdir()
to get a list of all files in specific folder
put csv_from_excel() function inside a for loop to iterate through each file in the list
path = "PATH/TO/FOLDER"
list = os.listdir(path)
for file in list:
fileName = str(file)
def csv_from_excel():
wb = xlrd.open_workbook(fileName)
sh = wb.sheet_by_name('RM')
csv_file = open('RM 2018-07-30.csv', 'w', newline='')
wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
csv_file.close()
csv_from_excel()
Update: To select multiple columns in CSV file, use pandas to store the columns content into a pandas data frame, then you can save the data frame as a CSV to a new folder
import pandas
#Store CSV columns into a pandas data frame
colNames = ['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']
data = pandas.read_csv(fileName, names=colNames)
#Extract the CSV columns to a new CSV
df = pandas.DataFrame(data, columns = colNames)
df.to_csv('PATH/TO/NEW/CSV', index=False)
Upvotes: 2
Reputation: 111
You can just read your .xlsx files, select your wanted columns and export to csv only using pandas
df = pd.read_excel(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx", sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])
df.to_csv('RM 2018-07-30.csv', index=False)
If you are iterating over files and directories, glob it's perfect for it. You can also check Pathlib library, can help you to get only directories or files, extensions or combine them to get your export paths.
For example if you have your list of files using glob:
from pathlib import Path
files = glob(files_path)
output_dir = Path('destination')
for file in files:
df = pd.read_excel(file, sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])
output_file = Path(file)
output_file = output_dir / '{}.csv'.format(output_file.stem)
df.to_csv(output_file, index=False)
Upvotes: 1
Reputation: 150
need to install: $ pip install rows openpyxl
import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))
Upvotes: 0