Reputation: 35321
I want to write a Python script that reads in an Excel spreadsheet and saves some of its worksheets as CSV files.
How can I do this?
I have found third-party modules for reading and writing Excel files from Python, but as far as I can tell, they can only save files in Excel (i.e. *.xls) format. If I'm wrong here, some example code showing how to do what I'm trying to do with these modules would be appreciated.
I also came across one solution that I can't quite understand, but seems to be Windows-specific, and therefore would not help me anyway, since I want to do this in Unix. At any rate, it's not clear to me that this solution can be extended to do what I want to do, even under Windows.
Upvotes: 32
Views: 93212
Reputation: 3148
Using pandas
will be a bit shorter:
import pandas as pd
df = pd.read_excel('my_file', sheet_name='my_sheet_name') # sheet_name is optional
df.to_csv('output_file_name', index=False) # index=False prevents pandas from writing a row index to the CSV.
# oneliner
pd.read_excel('my_file', sheetname='my_sheet_name').to_csv('output_file_name', index=False)
Upvotes: 18
Reputation: 8407
As of December 2021 and Python 3:
The openpyxl
API has changed sufficiently (see https://openpyxl.readthedocs.io/en/stable/usage.html) that I have updated this part of the answer by @Boud (now @Zeugma?), as follows:
import openpyxl
import csv
wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as file_handle:
csv_writer = csv.writer(file_handle)
for row in sh.iter_rows(): # generator; was sh.rows
csv_writer.writerow([cell.value for cell in row])
@Leonid made some helpful comments - in particular:
csv.writer
provides some additional options e.g. custom delimiter:
csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
HTH
Upvotes: 17
Reputation: 32105
The most basic examples using the two libraries described line by line:
import xlrd
import csv
with xlrd.open_workbook('a_file.xls') as wb:
sh = wb.sheet_by_index(0) # or wb.sheet_by_name('name_of_the_sheet_here')
with open('a_file.csv', 'wb') as f: # open('a_file.csv', 'w', newline="") for python 3
c = csv.writer(f)
for r in range(sh.nrows):
c.writerow(sh.row_values(r))
import openpyxl
import csv
wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active
with open('test.csv', 'wb') as f: # open('test.csv', 'w', newline="") for python 3
c = csv.writer(f)
for r in sh.rows:
c.writerow([cell.value for cell in r])
Upvotes: 65
Reputation: 795
First read your Excel spreadsheet into Pandas. The code below will import your Excel spreadsheet into Pandas as an OrderedDict
which contains all of your worksheets as DataFrames
. Then, simply use the worksheet_name
as a key to access specific worksheet as a DataFrame
and save only the required worksheet as a csv
file by using df.to_csv()
. Hope this will work in your case.
import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('output.csv')
Upvotes: 0
Reputation: 295678
Use the xlrd
or openpyxl
module to read xls or xlsx documents respectively, and the csv
module to write.
Alternately, if using Jython, you can use the Apache POI library to read either .xls
or .xlsx
, and the native CSV module will still be available.
Upvotes: 5