user30958
user30958

Reputation: 53

Converting a folder of Excel files into CSV files/Merge Excel Workbooks

I have a folder with a large number of Excel workbooks. Is there a way to convert every file in this folder into a CSV file using Python's xlrd, xlutiles, and xlsxWriter?

I would like the newly converted CSV files to have the extension '_convert.csv'.

OTHERWISE...

Is there a way to merge all the Excel workbooks in the folder to create one large file?

I've been searching for ways to do both, but nothing has worked...

Upvotes: 1

Views: 4222

Answers (5)

chfw
chfw

Reputation: 4592

I will give a try with my library pyexcel:

from pyexcel import Book, BookWriter
import glob
import os


for f in glob.glob("your_directory/*.xlsx"):
    fullname = os.path.abspath(f)
    converted_filename = fullname.replace(".xlsx", "_converted.csv")
    book = Book(f)
    converted_csvs = BookWriter(converted_filename)
    converted_csvs.write_book_reader(book)
    converted_csvs.close()

If you have a xlsx that has more than 2 sheets, I imagine you will have more than 2 csv files generated. The naming convention is: "file_converted_%s.csv" % your_sheet_name. The script will save all converted csv files in the same directory where you had xlsx files.

In addition, if you want to merge all in one, it is super easy as well.

from pyexcel.cookbook import merge_all_to_a_book
import glob


merge_all_to_a_book(glob.glob("your_directory/*.xlsx"), "output.xlsx")

If you want to do more, please read the tutorial

Upvotes: 1

CCKx
CCKx

Reputation: 1343

You can use this function to read the data from each file

import xlrd

def getXLData(Filename, min_row_len=1, get_datemode=False, sheetnum=0):
  Data = []
  book = xlrd.open_workbook(Filename)
  sheet = book.sheets()[sheetnum] 
  rowcount = 0
  while rowcount < sheet.nrows:
    row = sheet.row_values(rowcount)
    if len(row)>=min_row_len: Data.append(row)
    rowcount+=1
  if get_datemode: return Data, book.datemode
  else: return Data

and this function to write the data after you combine the lists together

import csv

def writeCSVFile(filename, data, headers = []):
  import csv
  if headers:
    temp = [headers]
    temp.extend(data)
    data = temp
  f = open(filename,"wb")
  writer = csv.writer(f)
  writer.writerows(data)
  f.close()

Keep in mind you may have to re-format the data, especially if there are dates or integers in the Excel files since they're stored as floating point numbers.

Edited to add code calling the above functions:

import glob

filelist = glob.glob("*.xls*")
alldata = []
headers = []
for filename in filelist:
  data = getXLData(filename)
  headers = data.pop(0) # omit this line if files do not have a header row
  alldata.extend(data)

writeCSVFile("Output.csv", alldata, headers)

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177941

Using pywin32, this will find all the .xlsx files in the indicated directory and open and resave them as .csv. It is relatively easy to figure out the right commands with pywin32...just record an Excel macro and perform the open/save manually, then look at the resulting macro.

import os
import glob
import win32com.client

xl = win32com.client.gencache.EnsureDispatch('Excel.Application')

for f in glob.glob('tmp/*.xlsx'):
    fullname = os.path.abspath(f)
    xl.Workbooks.Open(fullname)
    xl.ActiveWorkbook.SaveAs(Filename=fullname.replace('.xlsx','.csv'),
                             FileFormat=win32com.client.constants.xlCSVMSDOS,
                             CreateBackup=False)
    xl.ActiveWorkbook.Close(SaveChanges=False)

Upvotes: 1

Aaron Altman
Aaron Altman

Reputation: 1755

Sure. Iterate over your files using something like glob and feed them into one of the modules you mention. With xlrd, you'd use open_workbook to open each file by name. That will give you back a Book object. You'll then want to have nested loops that iterate over each Sheet object in the Book, each row in the Sheet, and each Cell in the Row. If your rows aren't too wide, you can append each Cell in a Row into a Python list and then feed that list to the writerow method of a csv.writer object.

Since it's a high-level question, this answer glosses over some specifics like how to call xlrd.open_workbook and how to create a csv.writer. Hopefully googling for examples on those specific points will get you where you need to go.

Upvotes: 0

user2867522
user2867522

Reputation:

Look at openoffice's python library. Although, I suspect openoffice would support MS document files.

Python has no native support for Excel file.

Upvotes: 0

Related Questions