Blabla3
Blabla3

Reputation: 11

Error when loading Excel with several worksheets

I have a large number of .xlsx files downloaded from an external database which I want to work with. It has two worksheets, the first worksheet only has some comments on the data and the second one contains the data.

I've tried opening the excel spreadsheet using the following two options, but they both give me an error. The error disappears when I delete the first worksheet. But since I have >350 files, I don't want to delete the all those worksheets manually.

The code I tried

    from openpyxl import load_workbook
    wb = load_workbook('/Users/user/Desktop/Data_14.xlsx')

Which gives the error:

    InvalidFileException: "There is no item named 'xl/styles.xml' in the archive"

And:

    from xlrd import open_workbook
    book = open_workbook('/Users/user/Desktop/Data_14.xlsx')

which gives a very long error message (KeyError: 9)

I think the problem is a formula error in the first excel worksheet. One cell in the worksheet says

- minimum percentage that must characterise the path from a subject Company up to its Ultimate owner: 50.01%                    

but it is not formatted as text. Executing the cell gives an error message in Excel. Inserting an " ' " to make it text lets me then open the file with python which is what I want to do.

Any ideas on how I can open the excel files automatically to solve this problem?

Upvotes: 0

Views: 657

Answers (2)

Mark
Mark

Reputation: 150

Solution:

I've named the script delsheet.py and placed it in a directory also containing the excel files.

  • I'm using Python 3.4.3 and Openpyxl 2.3.0 but this should work for Openpyxl 2.0+
  • I am on a Mac OS X running Yosemite.

  • Knowing your versions and settings would be useful because openpyxl can be fickle with syntax depending on the version.

  • Worksheet names, either I over looked or you failed to mention if the first worksheet in the Excel files have unique names or if they are all the same.
  • If they are all the same then that is convenient and if all the first sheets are named 'Sheet1' then this script will work as is, and that is how you worded the question so this is how I've written the solution; if different please clarify. Thanks.

  • Understanding the script:

    1. First the script stores the path of the script location to know which directory it is being called from and therefore located.

    2. From that location the script lists the files in the same directory with the file extension .xlsx appending them to the list 'spreadsheet_list'

    3. Using a for loop and getting the number of elements in the list 'spreadsheet_list' lets the script know how long to iterate through the elements in the list.

      • the loop loads in an excel file from the list
      • removes 'sheet1'
      • saves the spreadsheet with the same original filename.

delsheet.py

#!/usr/bin/env python3
# Using python 3.4.3 and openpyxl 2.3.0
# Remove the first worksheet from a batch of excel sheets

# Import Modules
import sys, os, re
from openpyxl import Workbook, load_workbook

# Create List
spreadsheet_list = []

# Script path to the directory it is located.
pcwd=os.path.dirname(os.path.abspath(__file__))

# List files in directory by file extension
# Specify directory
items = os.listdir(pcwd)

# Specify extension in "if" loop and append the files in the directory to the "spreadsheet_list" list.
for names in items:
    if names.endswith(".xlsx"):
        spreadsheet_list.append(names)

# Debugging purposes: print out the list of appended excel files in script directory
# print(spreadsheet_list) 

# For loop, using the number of elements in the spreadsheet_list we can determine how long the loop should go
for i in range(len(spreadsheet_list)):
    # print(i) to see that i is = to the number of excel files located in the directory
    # Load workbook into memory (Opening the Excel file automatically...)
    wb = load_workbook(spreadsheet_list[int(i)])
    ## Store Sheet1 in the workbook as 'ws'
    ws = wb['Sheet1']
    ## Remove the worksheet 'ws'
    wb.remove_sheet(ws)
    ## Save the edited excel sheets (with the original name)
    wb.save(spreadsheet_list[int(i)])

Upvotes: 1

user5656611
user5656611

Reputation:

Please try this add-in to merge all 2nd sheets.

http://www.rondebruin.nl/win/addins/rdbmerge.htm

Or, run this script to delete all first sheets in all workbooks . . .

Sub Example()
    Dim MyPath As String, FilesInPath As String
    Dim MyFiles() As String, Fnum As Long
    Dim mybook As Workbook
    Dim CalcMode As Long
    Dim sh As Worksheet
    Dim ErrorYes As Boolean

    Application.DisplayAlerts = False
    'Fill in the path\folder where the files are
    MyPath = "C:\Users\rshuell001\Desktop\excel\"

    'Add a slash at the end if the user forget it
    If Right(MyPath, 1) <> "\" Then
        MyPath = MyPath & "\"
    End If

    'If there are no Excel files in the folder exit the sub
    FilesInPath = Dir(MyPath & "*.xl*")
    If FilesInPath = "" Then
        MsgBox "No files found"
        Exit Sub
    End If

    'Fill the array(myFiles)with the list of Excel files in the folder
    Fnum = 0
    Do While FilesInPath <> ""
        Fnum = Fnum + 1
        ReDim Preserve MyFiles(1 To Fnum)
        MyFiles(Fnum) = FilesInPath
        FilesInPath = Dir()
    Loop

    'Change ScreenUpdating, Calculation and EnableEvents
    With Application
        CalcMode = .Calculation
        .Calculation = xlCalculationManual
        .ScreenUpdating = False
        .EnableEvents = False
    End With

    'Loop through all files in the array(myFiles)
    If Fnum > 0 Then
        For Fnum = LBound(MyFiles) To UBound(MyFiles)
            Set mybook = Nothing
            On Error Resume Next
            Set mybook = Workbooks.Open(MyPath & MyFiles(Fnum))
            On Error GoTo 0

            If Not mybook Is Nothing Then


                'Change cell value(s) in one worksheet in mybook
                On Error Resume Next
                With mybook.Worksheets(1)
                    ActiveSheet.Delete
                End With


                If Err.Number > 0 Then
                    ErrorYes = True
                    Err.Clear
                    'Close mybook without saving
                    mybook.Close savechanges:=False
                Else
                    'Save and close mybook
                    mybook.Close savechanges:=True
                End If
                On Error GoTo 0
            Else
                'Not possible to open the workbook
                ErrorYes = True
            End If

        Next Fnum
    End If

    If ErrorYes = True Then
        MsgBox "There are problems in one or more files, possible problem:" _
             & vbNewLine & "protected workbook/sheet or a sheet/range that not exist"
    End If

    'Restore ScreenUpdating, Calculation and EnableEvents
    With Application
        .ScreenUpdating = True
        .EnableEvents = True
        .Calculation = CalcMode
    End With
    Application.DisplayAlerts = True
End Sub

Upvotes: 0

Related Questions