Ash15
Ash15

Reputation: 25

Convert Excel to csv UTF-8 script issue

So something very weird is happening to me and I am unable to find the resolution for this. I have a script that converts excel to csv UTF-8 file which goes like this:

$root = "D:\Users\ashish.sinha\Downloads\Trail Overview"
#open excel object and get excel name

$excel = new-object -comobject excel.application

$excel.Visible=$True

$excel.DisplayAlerts = $False

$excelFiles = Get-ChildItem -Path $root -FIlter *.xlsx

$workbook = $excel.workbooks.open($excelFiles.FullName)

$worksheet = $workbook.worksheets.item(1)


$csvName = $excelFiles.FullName -replace 'xlsx','csv'

$worksheet.SaveAs($csvName,6)

$workbook.save()

$workbook.close()

$excel.quit()

When i run this code in parts like first 5-6 lines it works but when i run the full code(not saving it just F5) then i get the error like this:

Exception from HRESULT: 0x800AC472
At line:14 char:1
+ $worksheet.SaveAs($csvName,6)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], COMException
    + FullyQualifiedErrorId : System.Runtime.InteropServices.COMException

Can someone please help?

Thanks, Ashish

Upvotes: 0

Views: 1730

Answers (1)

Theo
Theo

Reputation: 61148

When using Excel up to version 2016, there is no way to convert an xlsx sheet to a UTF-8 encoded csv file built in. My earlier comment about a format specifier xlCSVUTF8 is for later versions I guess (the docs do not reveal at what point Microsoft implemented that constant..)

So, to convert to UTF-8 csv file, I have created this function:

function Convert-ExcelToCsv {
    # converts a worksheet from .xsl and .xslx files to Csv files in UTF-8 encoding
    [CmdletBinding()]
    Param(
        [ValidateScript({Test-Path $_ -PathType Leaf})]
        [Parameter(Position = 0, Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
        [Alias("FilePath", "FullName")]
        [string[]]$Path,

        [Parameter(Mandatory = $false)]
        [int]$SheetNumber = 1,

        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','

    )
    begin {
        try {
            $excel = New-Object -ComObject Excel.Application -ErrorAction Stop -Verbose:$false
            $excel.Visible = $false 
            $excel.DisplayAlerts = $false
        }
        catch { 
            throw "This function needs Microsoft Excel to be installed." 
        }
    }

    process {
        foreach ($xlFile in $Path) {
            Write-Verbose "Processing '$xlFile'"
            # convert Excel file to CSV file UTF-8
            $workbook = $excel.Workbooks.Open($xlFile)

            # set the active worksheet
            if ($SheetNumber -notin 1..@($workbook.Sheets).Count) { $SheetNumber = 1 }
            $workbook.Worksheets.Item($SheetNumber).Activate()

            # Unfortunately, Excel up to and including version 2016 has no option to export csv format in UTF8 encoding 
            # so we save as 'Unicode Text (*.txt)' (= Tab delimited)
            # See: https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.workbook.saveas.aspx

            # Apparently, at some point (version 2019?) there is a new format specifier called xlCSVUTF8 (value 62),
            # but I can't find anywhere as of which version this is a valid value. It certainly doesn't exist in 
            # versions up to and including version 2016.
            # see https://learn.microsoft.com/en-us/office/vba/api/excel.xlfileformat

            # create a temporary file to store the in-between result
            $tempFile = [System.IO.Path]::ChangeExtension([System.IO.Path]::GetTempFileName(), ".txt")
            if (Test-Path -Path $tempFile -PathType Leaf) { Remove-Item -Path $tempFile -Force }

            $xlUnicodeText = 42         # Tab-delimited. See: https://msdn.microsoft.com/en-us/library/bb241279.aspx
            $workbook.SaveAs($tempFile, $xlUnicodeText) 
            $workbook.Saved = $true
            $workbook.Close()

            # now import, delete the temp file and save as Csv in UTF-8 encoding
            $result = Import-Csv -Path $tempFile -Encoding Unicode -Delimiter "`t" -ErrorAction SilentlyContinue -WarningAction SilentlyContinue
            Remove-Item -Path $tempFile -Force
            $csvFile = [System.IO.Path]::ChangeExtension($xlFile, ".csv")
            Write-Verbose "Creating '$csvFile'"
            $result | Export-Csv -Path $csvFile -Delimiter $Delimiter -Encoding UTF8 -NoTypeInformation -Force
        }
    }

    end {
        Write-Verbose "Quit and cleanup"
        $excel.Quit()

        # cleanup COM objects
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook) | Out-Null
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel) | Out-Null
        [System.GC]::Collect()
        [System.GC]::WaitForPendingFinalizers()
    }
}

With this function in place at the top of your script, you can use it like

$root = "D:\Users\ashish.sinha\Downloads\Trail Overview"
Get-ChildItem -Path $root -Filter '*.xlsx' | Convert-ExcelToCsv -Verbose

Leave out switch -Verbose if you do not wish to see informative messages on screen

Upvotes: 1

Related Questions