Reputation: 11
I have a large number of .xlsx files downloaded from an external database which I want to work with. It has two worksheets, the first worksheet only has some comments on the data and the second one contains the data.
I've tried opening the excel spreadsheet using the following two options, but they both give me an error. The error disappears when I delete the first worksheet. But since I have >350 files, I don't want to delete the all those worksheets manually.
The code I tried
from openpyxl import load_workbook
wb = load_workbook('/Users/user/Desktop/Data_14.xlsx')
Which gives the error:
InvalidFileException: "There is no item named 'xl/styles.xml' in the archive"
And:
from xlrd import open_workbook
book = open_workbook('/Users/user/Desktop/Data_14.xlsx')
which gives a very long error message (KeyError: 9)
I think the problem is a formula error in the first excel worksheet. One cell in the worksheet says
- minimum percentage that must characterise the path from a subject Company up to its Ultimate owner: 50.01%
but it is not formatted as text. Executing the cell gives an error message in Excel. Inserting an " ' " to make it text lets me then open the file with python which is what I want to do.
Any ideas on how I can open the excel files automatically to solve this problem?
Upvotes: 0
Views: 657
Reputation: 150
Solution:
I've named the script delsheet.py and placed it in a directory also containing the excel files.
I am on a Mac OS X running Yosemite.
Knowing your versions and settings would be useful because openpyxl can be fickle with syntax depending on the version.
If they are all the same then that is convenient and if all the first sheets are named 'Sheet1' then this script will work as is, and that is how you worded the question so this is how I've written the solution; if different please clarify. Thanks.
Understanding the script:
First the script stores the path of the script location to know which directory it is being called from and therefore located.
From that location the script lists the files in the same directory with the file extension .xlsx appending them to the list 'spreadsheet_list'
Using a for loop and getting the number of elements in the list 'spreadsheet_list' lets the script know how long to iterate through the elements in the list.
delsheet.py
#!/usr/bin/env python3
# Using python 3.4.3 and openpyxl 2.3.0
# Remove the first worksheet from a batch of excel sheets
# Import Modules
import sys, os, re
from openpyxl import Workbook, load_workbook
# Create List
spreadsheet_list = []
# Script path to the directory it is located.
pcwd=os.path.dirname(os.path.abspath(__file__))
# List files in directory by file extension
# Specify directory
items = os.listdir(pcwd)
# Specify extension in "if" loop and append the files in the directory to the "spreadsheet_list" list.
for names in items:
if names.endswith(".xlsx"):
spreadsheet_list.append(names)
# Debugging purposes: print out the list of appended excel files in script directory
# print(spreadsheet_list)
# For loop, using the number of elements in the spreadsheet_list we can determine how long the loop should go
for i in range(len(spreadsheet_list)):
# print(i) to see that i is = to the number of excel files located in the directory
# Load workbook into memory (Opening the Excel file automatically...)
wb = load_workbook(spreadsheet_list[int(i)])
## Store Sheet1 in the workbook as 'ws'
ws = wb['Sheet1']
## Remove the worksheet 'ws'
wb.remove_sheet(ws)
## Save the edited excel sheets (with the original name)
wb.save(spreadsheet_list[int(i)])
Upvotes: 1
Reputation:
Please try this add-in to merge all 2nd sheets.
http://www.rondebruin.nl/win/addins/rdbmerge.htm
Or, run this script to delete all first sheets in all workbooks . . .
Sub Example()
Dim MyPath As String, FilesInPath As String
Dim MyFiles() As String, Fnum As Long
Dim mybook As Workbook
Dim CalcMode As Long
Dim sh As Worksheet
Dim ErrorYes As Boolean
Application.DisplayAlerts = False
'Fill in the path\folder where the files are
MyPath = "C:\Users\rshuell001\Desktop\excel\"
'Add a slash at the end if the user forget it
If Right(MyPath, 1) <> "\" Then
MyPath = MyPath & "\"
End If
'If there are no Excel files in the folder exit the sub
FilesInPath = Dir(MyPath & "*.xl*")
If FilesInPath = "" Then
MsgBox "No files found"
Exit Sub
End If
'Fill the array(myFiles)with the list of Excel files in the folder
Fnum = 0
Do While FilesInPath <> ""
Fnum = Fnum + 1
ReDim Preserve MyFiles(1 To Fnum)
MyFiles(Fnum) = FilesInPath
FilesInPath = Dir()
Loop
'Change ScreenUpdating, Calculation and EnableEvents
With Application
CalcMode = .Calculation
.Calculation = xlCalculationManual
.ScreenUpdating = False
.EnableEvents = False
End With
'Loop through all files in the array(myFiles)
If Fnum > 0 Then
For Fnum = LBound(MyFiles) To UBound(MyFiles)
Set mybook = Nothing
On Error Resume Next
Set mybook = Workbooks.Open(MyPath & MyFiles(Fnum))
On Error GoTo 0
If Not mybook Is Nothing Then
'Change cell value(s) in one worksheet in mybook
On Error Resume Next
With mybook.Worksheets(1)
ActiveSheet.Delete
End With
If Err.Number > 0 Then
ErrorYes = True
Err.Clear
'Close mybook without saving
mybook.Close savechanges:=False
Else
'Save and close mybook
mybook.Close savechanges:=True
End If
On Error GoTo 0
Else
'Not possible to open the workbook
ErrorYes = True
End If
Next Fnum
End If
If ErrorYes = True Then
MsgBox "There are problems in one or more files, possible problem:" _
& vbNewLine & "protected workbook/sheet or a sheet/range that not exist"
End If
'Restore ScreenUpdating, Calculation and EnableEvents
With Application
.ScreenUpdating = True
.EnableEvents = True
.Calculation = CalcMode
End With
Application.DisplayAlerts = True
End Sub
Upvotes: 0