fireshark519
fireshark519

Reputation: 175

Import from Excel to SQL using SSIS

I am automating some things at work so I decided to encompass it in a SSIS package. I have been working on this for months and one of the problems I faced at the beginning resurfaced.

I receive a report through email, which is downloaded renamed and placed into L:\MACROS\SSIS\Input (this is done through a C# application I created).

I then import the data from that report into SQL.

The problem exists here, as I try to get the data from the xls file a specific column has 1 of 2 behaviours. If the top row of data is only numeric, it will automatically assign this as numeric and only import numeric values, anything non-numeric is transformed into null.

This column is the invoice number which usually is numeric, but there is a world region where they will be non-numeric (i.e: "MAGI:1326564" I get this error message when I open my data flow object:

TITLE: Microsoft Visual Studio

The metadata of the following output columns does not match the metadata of the external columns with which the output columns are associated:

Output "Excel Source Output": "F11"

Do you want to replace the metadata of the output columns with the metadata of the external columns?

------------------------------ BUTTONS:

&Yes &No

I can either get the numerics or the non-numeric values.

Now, as I wanted a permanent fix I thought about just using C# to create a separate column for non-numeric and delete them from the original column.

That way I have a reusable method of fixing the above issue.

        try
        {
            //Start Excel and get Application object.
            oXL = new Microsoft.Office.Interop.Excel.Application();
            oXL.Visible = false;

            oWB = (Microsoft.Office.Interop.Excel._Workbook)(oXL.Workbooks.Open(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls"));
            oSheet = (Microsoft.Office.Interop.Excel._Worksheet)oWB.ActiveSheet;


        /*    int nInLastRow = oSheet.Cells.Find("*", System.Reflection.Missing.Value,
            System.Reflection.Missing.Value, System.Reflection.Missing.Value, Microsoft.Office.Interop.Excel.XlSearchOrder.xlByRows, Microsoft.Office.Interop.Excel.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Row;
            */

              var j = 7;


            var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value; 


            //        while (j < 20)/*nInLastRow)*/
            //        {
            i = 0;
                foreach (char value in cellValue)
                {
                    bool digit = char.IsDigit(value);
                    if (digit == true)
                    {
                        i = i + 1;
                    }
                    else { i = i + 0; }

                }
                if (i > 1)
                {
                    oSheet.Cells[j, 22] = cellValue;
                    //oSheet.Cells[j, 11].Clear();
                }

            // Close the workbook, tell it to save and give the path.

            //   j = j + 1;
            //        }

            oXL.DisplayAlerts = false;

            oWB.SaveAs(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange, Type.Missing, Type.Missing, Type.Missing,Type.Missing, Type.Missing);


            oWB.Close();

            // Now quit the application.

            oXL.Quit();

            // Call the garbage collector to collect and wait for finalizers to finish.

            GC.Collect();

            GC.WaitForPendingFinalizers();

            // Release the COM objects that have been instantiated.

            Marshal.FinalReleaseComObject(oWB);

            Marshal.FinalReleaseComObject(oSheet);
            //  Marshal.FinalReleaseComObject(oRng);

            Marshal.FinalReleaseComObject(oXL);
        }

        catch (Exception theException)
        {
            String errorMessage;
            errorMessage = "Error: ";
            errorMessage = String.Concat(errorMessage, theException.Message);
            errorMessage = String.Concat(errorMessage, " Line: ");
            errorMessage = String.Concat(errorMessage, theException.Source);

            MessageBox.Show(errorMessage, "Error");
        }

I keep getting an error message while running C#

"Cannot convert type double to string.

The code was working before implementing the loop (for 2 tries), after implemented the loop it wouldn't work anymore so I commented out the loop but I still get the same error.

I have also changed:

            var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value; 

to

var cellValue = (oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value.ToString();

with this change it worked for 2 tests and wouldn't work anymore.

If I change it to:

string cellValue =  "MA1352564";

it will execute what I wanted it to do so I have narrowed it down to the issue being trying to convert the value of the cell to a string so that it can there check if the characters in the string are digits or not.

I am looking for either a different solution to my import problem or any ideas on how to fix the C# section of code :)

EDIT: I forgot to mention that if I enable data viewer in the flow, the data coming out of excel is already stripped off the non-numeric data....

EDIT2:

After using the suggested options I get this error:

Error: 0xC0202009 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.". Error: 0xC0208265 at DataInputUni, Excel Source [12]: Failed to retrieve long data for column "F11". Error: 0xC020901C at DataInputUni, Excel Source [12]: There was an error with Excel Source.Outputs[Excel Source Output].Columns[F11] on Excel Source.Outputs[Excel Source Output]. The column status returned was: "DBSTATUS_UNAVAILABLE". Error: 0xC0209029 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "Excel Source.Outputs[Excel Source Output].Columns[F11]" failed because error code 0xC0209071 occurred, and the error row disposition on "Excel Source.Outputs[Excel Source Output].Columns[F11]" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure. Error: 0xC0047038 at DataInputUni, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Excel Source returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

Upvotes: 0

Views: 2084

Answers (3)

We found that the solution to getting the values from a mixed column of numeric strings and alpha-numeric strings was solved if the Excel format for the column is "Text" rather than "General." Without that the alphanumeric values are output as nulls.

Since we do not control creation of the workbooks and need an automated solution, I created a script task to use the Microsoft.Office.Interop.Excel library to locate the source file and change the format to the correct format.

I am only including a sample of the class since I have a few overloads for each of the constructors and some constants that are specific to my work, but you can see the idea.

using Microsoft.CSharp;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using static WorkbookTools.Shared;
using XL = Microsoft.Office.Interop.Excel;

namespace WorkbookTools
    {
    public class WorkbookFormatManager : IDisposable
        {



        #region Public Methods

        public bool CopySourceToTarget(string sourceFullFileName,string targetFileName)
            {
            int attempts = 0;

            bool success = false;
            SourceFullName = sourceFullFileName;

            TargetFullName = $@"{TargetFolder}{targetFileName}";// initialize the filename for checking its open/closed state

            while (!success && attempts < 3)
                {

                success = CopySourceToTarget ();

                targetFileName = IncrementedVersionFilename (targetFileName);
                attempts++;
                }
            return success;
            }

        public bool CopySourceToTarget()
            {
            bool success = false;
            if ((SourceFileName != TargetFileName) && File.Exists (SourceFullName))
                {
                try
                    {

                    File.Copy (SourceFullName,TargetFullName,true);
                    success = true;


                    }
                catch (System.Runtime.InteropServices.COMException ce)
                    {
                    TargetFileName = IncrementedVersionFilename (TargetFileName);
                    }
                catch (Exception ex)
                    {
                    Console.WriteLine (ex.Message + Environment.NewLine + ex.StackTrace);
                    }
                }
            else
                {
                success = true;

                }
            return success;
            }


        public void Dispose()
            {
            // Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
            Dispose (disposing: true);
            GC.SuppressFinalize (this);
            }

        public XL.Workbook GetWorkbook(string workbookFullName)
            {
            XL.Workbook wb = null;
            if (File.Exists (workbookFullName))
                {
                wb = XlApp.Workbooks.Open (workbookFullName);

                }
            else
                {
                wb = XlApp.Workbooks.Add ();
                wb.SaveAs (workbookFullName);
                }

            return wb;
            }

        public string IncrementedVersionFilename(string workbookName)
            {
            //TODO: Set private after unit testing is done. JMM 05-10-2023
            string originalName = workbookName;
            string newName = string.Empty;
            string rootName = originalName.Replace (".xlsx",string.Empty);
            int versionIndex = rootName.ToUpper ().LastIndexOf ("V");
            int incrementedVersion = 0;
            string versionPart = string.Empty;
            if (versionIndex > 0)   //there is a version number
                {
                versionPart = rootName.Substring (versionIndex,rootName.Length - versionIndex);
                int.TryParse (versionPart.Replace ("V",""),out incrementedVersion);
                if (incrementedVersion > 0)
                    {
                    incrementedVersion += 1;

                    newName = rootName.Replace (versionPart,$"V{incrementedVersion:0}.xlsx");
                    }
                }
            else
                {
                incrementedVersion = 1;
                newName = $"{rootName}{incrementedVersion:0}.xlsx";
                }

            return newName;
            }

        #endregion Public Methods


        #region Protected Methods

        protected virtual void Dispose(bool disposing)
            {
            try
                {
                if (!disposedValue)
                    {
                    if (disposing)
                        {
                        if (WbSource != null)
                            {
                            WbSource.Close ();
                            WbSource = null;
                            }
                        if (WbTarget != null)
                            {
                            WbTarget.Close ();
                            WbTarget = null;
                            }
                        if (XlApp != null)
                            {
                            XlApp.Quit ();
                            XlApp = null;
                            }
                        }

                    disposedValue = true;
                    }
                }
            catch (Exception ex)
                {
#if DEBUG
                Console.WriteLine ($"{ex.Message}");
#endif
                }
            finally
                {
                }
            }

        #endregion Protected Methods


        #region Public Properties

        public void FormatExcelRange(XL.Range rngTarget,ColumnFormat format)
            {

            rngTarget.NumberFormat = FormatCode (format);
            WbTarget.Save ();

            }

        public void FormatExcelRange(XL.Worksheet targetSheet,string rangeDefinition,ColumnFormat format)
            {
            XL.Range rngTarget = targetSheet.Columns[$@"{rangeDefinition}"];

            FormatExcelRange (rngTarget,format);
            }

        public void FormatExcelRange(string targetSheetName,string rangeDefinition,ColumnFormat format)
            {
            XL.Worksheet wsTarget = WbTarget.Worksheets[targetSheetName];
            XL.Range rngTarget = wsTarget.Columns[$@"{rangeDefinition}"];

            FormatExcelRange (rngTarget,format);

            }

        public XL.Worksheet GetTargetWorksheet(XL.Workbook targetWorkbook,string targetWorksheetName)
            {
            XL.Worksheet xlWorksheet = targetWorkbook.Worksheets[targetWorksheetName];
            return xlWorksheet;
            }

        public string ArchiveFolder { get; set; } = string.Empty;
        public string FormatString { get => formatString ?? @"@"; set => formatString = value ?? @"@"; }

        public XL.Range RngTarget { get; set; }
        public string SourceFileName { get; set; } = string.Empty;
        public string SourceFolder
            {
            get
                {
                if (!sourceFolder.EndsWith (@"\"))
                    {
                    sourceFolder = $@"{sourceFolder}\";
                    }
                return
                sourceFolder;
                }
            set => sourceFolder = value;
            }

        public string SourceFullName
            {
            get
                {
                if (sourceFullName == null)
                    {
                    sourceFullName = $@"{SourceFolder}{SourceFileName}";
                    }
                return sourceFullName;
                }
            set => sourceFullName = value;
            }

        public string TargetFileName
            {
            get
                {
                if (string.IsNullOrEmpty (targetFileName))
                    {
                    targetFileName = IncrementedVersionFilename (SourceFileName);
                    }
                return targetFileName;
                }
            set => targetFileName = value;
            }

        public string TargetFolder
            {
            get
                {
                if (string.IsNullOrEmpty (targetFolder))
                    {
                    targetFolder = SourceFolder;
                    }
                return targetFolder;
                }
            set => targetFolder = value;
            }

        public string TargetFullName
            {
            get
                {
                if (string.IsNullOrEmpty (targetFullName))
                    {
                    if (string.IsNullOrEmpty (targetFolder))
                        {
                        targetFolder = SourceFolder;
                        }
                    if (string.IsNullOrEmpty (targetFileName))
                        {
                        targetFileName = IncrementedVersionFilename (SourceFileName);
                        }
                    if (!targetFolder.EndsWith (@"\"))
                        {
                        targetFolder = $@"{targetFolder}\";
                        }
                    }
                targetFullName = $@"{targetFolder}{targetFileName}";
                return targetFullName;
                }
            set => targetFullName = value;
            }

        public XL.Workbook WbSource
            {
            get
                {
                if (wbSource == null)
                    {
                    wbSource = GetWorkbook (SourceFullName);
                    }
                return wbSource;
                }
            set => wbSource = value;
            }

        public XL.Workbook WbTarget
            {
            get
                {
                if (wbTarget == null)
                    {
                    wbTarget = GetWorkbook (TargetFullName);
                    }
                return wbTarget;
                }
            set => wbTarget = value;
            }

        public XL.Worksheet WsTarget { get => wsTarget ?? GetTargetWorksheet (WbTarget,WsTargetName); set => wsTarget = value; }

        public string WsTargetName { get; set; } = string.Empty;

        public XL.Application XlApp
            {
            get
                {
                if (xlApp == null)
                    {
                    xlApp = new XL.Application ();
                    }
                xlApp.DisplayAlerts = false;
                return xlApp;
                }
            set => xlApp = value;
            }


        #endregion Public Properties


        #region Private Fields

        private bool disposedValue;

        private string formatString = string.Empty;
        private string sourceFolder = string.Empty;

        private string sourceFullName;

        private bool success = false;
        private string targetFileName = string.Empty;

        private string targetFolder = string.Empty;

        private string targetFullName;


        private XL.Workbook wbSource = null;
        private XL.Workbook wbTarget = null;
        private XL.Worksheet wsTarget;
        private XL.Application xlApp = null;
        #endregion Private Fields

        #region Public Constructors

        // Simplest constructor using mostly defaults
        public WorkbookFormatManager(string sourcePath,string workbookName)
            {
            sourceFolder = sourcePath;

            if (!SourceFolder.EndsWith (@"\"))
                {
                SourceFolder = $@"{SourceFolder}\";
                }
            SourceFileName = workbookName;
            SourceFullName = $@"{SourceFolder}{SourceFileName}";
            TargetFolder = SourceFolder;
            if (!TargetFolder.EndsWith (@"\"))
                {
                TargetFolder = $@"{TargetFolder}\";
                }
            TargetFileName = IncrementedVersionFilename (SourceFileName);
            TargetFullName = $@"{TargetFolder}{TargetFileName}";
            }

        // Slightly more explicit constructor
        public WorkbookFormatManager(string sourcePath,string workbookName,string targetPath)
            {
            SourceFolder = sourcePath;
            if (!SourceFolder.EndsWith (@"\"))
                {
                SourceFolder = $@"{SourceFolder}\";
                }
            SourceFileName = workbookName;
            SourceFullName = $@"{SourceFolder}{SourceFileName}";
            TargetFolder = targetPath;
            if (!TargetFolder.EndsWith (@"\"))
                {
                TargetFolder = $@"{TargetFolder}\";
                }
            TargetFileName = IncrementedVersionFilename (workbookName);
            TargetFullName = $@"{TargetFolder}{TargetFileName}";
            }

        // most explicit constructor
        public WorkbookFormatManager(string sourcePath,string workbookName,string targetPath,string targetWorkbookName)
            {
            SourceFolder = sourcePath;
            if (!SourceFolder.EndsWith (@"\"))
                {
                SourceFolder = $@"{SourceFolder}\";
                }
            SourceFileName = workbookName;
            SourceFullName = $@"{SourceFolder}{SourceFileName}";
            TargetFolder = targetPath;
            if (!TargetFolder.EndsWith (@"\"))
                {
                TargetFolder = $@"{TargetFolder}\";
                }
            TargetFileName = targetWorkbookName;
            TargetFullName = $@"{TargetFolder}{TargetFileName}";
            }

        #endregion Public Constructors

        // // TODO: override finalizer only if 'Dispose(bool disposing)' has code to free unmanaged
        // resources ~RangeFormatter() { // Do not change this code. Put cleanup code in
        // 'Dispose(bool disposing)' method Dispose(disposing: false); }
        }
    }

And the shared class referenced is:

namespace WorkbookTools
    {
    public static class Shared
        {
        public static string FormatCode(ColumnFormat format)
            {
            string formatString = "General";
            switch (format)
                {
                case ColumnFormat.Text:
                    formatString = @"@";
                    break;

                case ColumnFormat.Decimal:
                    formatString = "#,##0.00";
                    break;

                case ColumnFormat.Integer:
                    formatString = "#,##0";
                    break;

                case ColumnFormat.Currency:
                    formatString = "$#,##0.00";
                    break;

                case ColumnFormat.DateTime:
                    formatString = "yyyy-mm-dd hh:nn";
                    break;

                case ColumnFormat.ShortDate:
                    formatString = "yyyy-mm-dd";
                    break;

                default:
                    formatString = "General";
                    break;
                }
            return formatString;
            }

        public enum ColumnFormat
            {
            General = 0,
            Currency = 1,
            Decimal = 2,
            DateTime = 3,
            Percentage = 4,
            ShortDate = 5,
            Text = 6,
            Integer = 7
            }
        }
    }

All I need to do now is figure out where exactly to insert the script task, so as to ensure that any file is updated before it is connected to the data source component. I am also using a boolean variable so that the formatting has to be completed before the next task runs, requiring both

Success AND workbookIsFormatted==true;

I am still cleaning it up, but I hope this gives the idea.

One caution: If using this inside the SSIS package pay attention to the implementation of the IDisposable interface. You need to ensure that the Excel workbook is saved and closed before trying to read from it in the SSIS data flow task. Otherwise you will get an access error. The overloads of the copying code is to allow the code to save to a new name with an incremented version number if there is an access conflict, but it is limited to 3 attempts after which you just have to ping someone and tell them to close the file!

I hope this is helpful.

Joey Morgan

Upvotes: 0

fireshark519
fireshark519

Reputation: 175

Thanks to Caius Jard for his answer. I found a solution for my problem, I tried changing the output file format of the report to CSV but this made it worse lol. with CSV it simply would not scan the cells at all and assign everything as string which caused issues with importing. I then tried using .xlsx (2007 excel) which meant a new connection manager and got this as the connection string:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO";

Instead of adding again what Caius suggested I tried changing it to this:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO;IMEX=1";

this fixed my problem!

Upvotes: 0

Caius Jard
Caius Jard

Reputation: 74700

It sounds like the Excel driver isn't reading enough data when guessing the datatype. In addition to setting ;Extended Properties="IMEX=1" in the connection string as per the comments, set the TypeGuessRows registry key to 0 according to which version of office, probably located at one of the following keys:

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
  • HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
  • HKEY_LOCAL_MACHINE\Software\Microsoft\Office\OFFICE NUMERICAL VERSION\Access Connectivity Engine\Engines\Excel\TypeGuessRows
  • HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\OFFICE NUMERICAL VERSION\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Setting TypeGuessRows to 0 causes the entire column to be scanned when guessing the datatype. Setting IMEX=1 causes data to be returned as text (this can be altered in the registry) when mixed values are encountered. Omitting IMEX=1 causes data that does not match the guessed datatype to be returned as null.. IMEX is thus less important than TypeGuessRows, as setting it can only make a reasonable difference if enough variety is encountered in the first 8 rows (default scan) for columns that exhibit variety

http://microsoft-ssis.blogspot.com/2011/06/mixed-data-types-in-excel-column.html

Upvotes: 1

Related Questions