Reputation: 333
I want to know what the fastest way is of reading and writing data to and from an open Excel workbook to c# objects. The background is that I want to develop a c# application that is used from Excel and uses data held in excel.
The business logic will reside in the c# application but the data will reside in an Excel workbook. The user will be using Excel and will click a button (or do something similar) on the excel workbook to initiate the c# application. The c# application will then read data off the excel workbook, process the data, and then write data back to the excel workbook.
There may be numerous blocks of data that are required to be read off and written back to the excel workbook but they will normally be of a relatively small size, say 10 rows and 20 columns. Occasionally a large list of data may need to be processed, of the order of 50,000 rows and 40 columns.
I know that this is relatively easy to do say using VSTO but I want to know what the fastest (but still robust and elegant) solution is and get an idea of the speed. I don't mind if the solution recommends using third party products or uses C++.
The obvious solution is using VSTO or interop but I don't know what the performance is like versus VBA which I'm currently using to read in the data, or if there are any other solutions.
This was posted on experts exchange saying that VSTO was dramatically slower than VBA but that was a couple of years ago and I don't know if the performance has improved.
http://www.experts-exchange.com/Microsoft/Development/VSTO/Q_23635459.html
Thanks.
Upvotes: 33
Views: 23012
Reputation: 16907
I'll take this as a challenge, and will bet the fastest way to shuffle your data between Excel and C# is to use Excel-DNA - http://excel-dna.net. (Disclaimer: I develop Excel-DNA. But it's still true...)
Because it uses the native .xll interface it skips all the COM integration overhead that you'd have with VSTO or another COM-based add-in approach. With Excel-DNA you could make a macro that is hooked up to a menu or ribbon button which reads a range, processes it, and writes it back to a range in Excel. All using the native Excel interface from C# - not a COM object in sight.
I've made a small test function that takes the current selection into an array, squares every number in the array, and writes the result into Sheet 2 starting from cell A1. You just need to add the (free) Excel-DNA runtime which you can download from http://excel-dna.net.
I read into C#, process and write back to Excel a million-cell range in under a second. Is this fast enough for you?
My function looks like this:
using ExcelDna.Integration;
public static class RangeTools {
[ExcelCommand(MenuName="Range Tools", MenuText="Square Selection")]
public static void SquareRange()
{
object[,] result;
// Get a reference to the current selection
ExcelReference selection = (ExcelReference)XlCall.Excel(XlCall.xlfSelection);
// Get the value of the selection
object selectionContent = selection.GetValue();
if (selectionContent is object[,])
{
object[,] values = (object[,])selectionContent;
int rows = values.GetLength(0);
int cols = values.GetLength(1);
result = new object[rows,cols];
// Process the values
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
if (values[i,j] is double)
{
double val = (double)values[i,j];
result[i,j] = val * val;
}
else
{
result[i,j] = values[i,j];
}
}
}
}
else if (selectionContent is double)
{
double value = (double)selectionContent;
result = new object[,] {{value * value}};
}
else
{
result = new object[,] {{"Selection was not a range or a number, but " + selectionContent.ToString()}};
}
// Now create the target reference that will refer to Sheet 2, getting a reference that contains the SheetId first
ExcelReference sheet2 = (ExcelReference)XlCall.Excel(XlCall.xlSheetId, "Sheet2"); // Throws exception if no Sheet2 exists
// ... then creating the reference with the right size as new ExcelReference(RowFirst, RowLast, ColFirst, ColLast, SheetId)
int resultRows = result.GetLength(0);
int resultCols = result.GetLength(1);
ExcelReference target = new ExcelReference(0, resultRows-1, 0, resultCols-1, sheet2.SheetId);
// Finally setting the result into the target range.
target.SetValue(result);
}
}
Upvotes: 47
Reputation: 12157
If the C# application is a stand-alone application, then you will always have cross-process marshaling involved that will overwhelm any optimizations you can do by switching languages from, say, C# to C++. Stick to your most preferred language in this situation, which sounds like is C#.
If you are willing to make an add-in that runs within Excel, however, then your operations will avoid cross-process calls and run about 50x faster.
If you run within Excel as an add-in, then VBA is among the fastest options, but it does still involve COM and so C++ calls using an XLL add-in would be fastest. But VBA is still quite fast in terms of calls to the Excel object model. As for actual calculation speed, however, VBA runs as pcode, not as fully compiled code, and so executes about 2-3x slower than native code. This sounds very bad, but it isn't because the vast majority of the execution time taken with a typical Excel add-in or application involves calls to the Excel object model, so VBA vs. a fully compiled COM add-in, say using natively compiled VB 6.0, would only be about 5-15% slower, which is not noticeable.
VB 6.0 is a compiled COM approach, and runs 2-3x faster than VBA for non-Excel related calls, but VB 6.0 is about 12 years old at this point and won't run in 64 bit mode, say if installing Office 2010, which can be installed to run 32 bit or 64 bit. Usage of 64 bit Excel is tiny at the moment, but will grow in usage, and so I would avoid VB 6.0 for this reason.
C#, if running in-process as an Excel add-in would execute calls to the Excel object model as fast as VBA, and execute non-Excel calls 2-3x faster than VBA -- if running unshimmed. The approach recommended by Microsoft, however, is to run fully shimmed, for example, by making use of the COM Shim Wizard. By being shimmed, Excel is protected from your code (if it's faulty) and your code is fully protected from other 3rd party add-ins that could otherwise potentially cause problems. The down-side to this, however, is that a shimmed solution runs within a separate AppDomain, which requires cross-AppDomain marshaling that incurrs an execution speed penalty of about 40x -- which is very noticeable in many contexts.
Add-ins using Visual Studio Tools for Office (VSTO) are automatically loaded within a shim and executes within a separate AppDomain. There is no avoiding this if using VSTO. Therefore, calls to the Excel object model would also incur an approximately 40x execution speed degradation. VSTO is a gorgeous system for making very rich Excel add-ins, but execution speed is its weakness for applications such as yours.
ExcelDna is a free, open source project that allows you to use C# code, which is then converted for you to an XLL add-in that uses C++ code. That is, ExcelDna parses your C# code and creates the required C++ code for you. I've not used it myself, but I am familiar with the process and it's very impressive. ExcelDna gets very good reviews from those that use it. [Edit: Note the following correction as per Govert's comments below: "Hi Mike - I want add a small correction to clarify the Excel-Dna implementation: all the managed-to-Excel glue works at runtime from your managed assembly using reflection - there is no extra pre-compilation step or C++ code generation. Also, even though Excel-Dna uses .NET, there need not be any COM interop involved when talking to Excel - as an .xll the native interface can be used directly from .NET (though you can also use COM if you want). This makes high-performance UDFs and macros possible." – Govert]
You also might want to look at Add-in Express. It's not free, but it would allow you to code in C# and although it shims your solution into a separate AppDomain, I believe that it's execution speed is outstanding. If I am understanding its execution speed correctly, then I'm not sure how Add-in Express doing this, but it might be taking advantage of something called FastPath AppDomain marshaling. Don't quote me on any of this, however, as I'm not very familiar with Add-in Express. You should check it out though and do your own research. [Edit: Reading Charles Williams' answer, it looks like Add-in Express enables both COM and C API access. And Govert states that Excel DNA also enables both COM and the fastrer C API access. So you'd probably want to check out both and compare them to ExcelDna.]
My advice would be to research Add-in Express and ExcelDna. Both approaches would allow you to code using C#, which you seem most familiar with.
The other main issue is how you make your calls. For example, Excel is very fast when handling an entire range of data passed back-and-forth as an array. This is vastly more efficient than looping through the cells individually. For example, the following code makes use of the Excel.Range.set_Value accessor method to assign a 10 x 10 array of values to a 10 x 10 range of cells in one shot:
void AssignArrayToRange()
{
// Create the array.
object[,] myArray = new object[10, 10];
// Initialize the array.
for (int i = 0; i < myArray.GetLength(0); i++)
{
for (int j = 0; j < myArray.GetLength(1); j++)
{
myArray[i, j] = i + j;
}
}
// Create a Range of the correct size:
int rows = myArray.GetLength(0);
int columns = myArray.GetLength(1);
Excel.Range range = myWorksheet.get_Range("A1", Type.Missing);
range = range.get_Resize(rows, columns);
// Assign the Array to the Range in one shot:
range.set_Value(Type.Missing, myArray);
}
One can similarly make use of the Excel.Range.get_Value accessor method to read an array of values from a range in one step. Doing this and then looping through the values within the array is vastly faster than looping trough the values within the cells of the range individually.
Upvotes: 40
Reputation: 459
First off, your solution cannot be an Excel UDF (user-defined function). In our manuals, we give the following definition: "Excel UDFs are used to build custom functions in Excel for the end user to use them in formulas." I wouldn't mind if you suggest a better definition :)
That definition shows that a UDF cannot add a button to the UI (I know that XLLs can modify the CommandBar UI) or intercept keyboard shortcuts as well as Excel events.
That is, ExcelDNA is out of scope because it is purposed for developing XLL add-ins. The same applies to Excel-targeted functionality of Add-in Express since it allows developing XLL add-ins and Excel Automation add-ins.
Because you need to handle Excel events, your solution can be a standalone application but there are obvious limitations of such approach. The only real way is to create a COM add-in; it allows handling Excel events and adding custom things to the Excel UI. You have three possibilities:
If talking about developing an Excel COM add-in, the 3 tools above provide different features: visual designers, shimming, etc. But I don't think they differ in the speed of accessing the Excel Object Model. Say, I don't know (and cannot imagine) why getting a COM object from the Default AppDomain should differ from getting the same COM object from another AppDomain. BTW, you can check if shimming influences the speed of operation by creating a shared add-in and then using the COM Shim Wizard to shim it.
Speed II. As I wrote to you yesterday: "The best way to speed up reading and writing to a range of cells is to create a variable of the Excel.Range type referring to that range and then read/write an array from/to the Value property of the variable." But contrary to what Francesco says, I don't attribute this to VSTO; this is a feature of the Excel object model.
Speed III. The fastest Excel UDFs are written in native C++, not in any .NET language. I haven't compared the speed of an XLL add-in produced by ExcelDNA and Add-in Express; I don't think you'll find any substantial difference here.
To sum up. I am convinced you are on a wrong way: COM add-ins based on Add-in Express, VSTO or Shared Add-in should read and write Excel cells at the same speed. I will be glad (sincerely) if someone disproves this statement.
Now on your other questions. VSTO doesn't allow developing a COM add-in supporting Office 2000-2010. It requires three different codebases and at least two versions of Visual Studio to completely support Office 2003-2010; you need to have strong nerves and a portion of good luck to deploy a VSTO-based add-in for Excel 2003. With Add-in Express, you create a COM add-in for all Office versions with a single codebase; Add-in Express provides you with a setup project, which is ready to install your add-in in Excel 2000-2010 (32-bit and 64-bit); ClickOnce deployment is on board too.
VSTO beats Add-in Express in one area: it allows creating so-called document-level add-ins. Imagine a workbook or template with some .NET code behind it; I wouldn't be surprised, however, if deployment of such things is a nightmare.
On Excel events. All Excel events are listed in MSDN, for instance, see Excel 2007 events
Regards from Belarus (GMT+2),
Andrei Smolin Add-in Express Team Leader
Upvotes: 3
Reputation: 15794
Further to Mike Rosenblum's comments on the use of arrays, I'd like to add that I've been using the very approach (VSTO + arrays) and when I measured it, the actual read speed itself was within milliseconds. Just remember to disable event handling and screen updating prior to the read/write, and remember to re-enable after the operation is complete.
Using C#, you can create 1-based arrays exactly the same as Excel VBA itself does. This is pretty useful, especially because even in VSTO, when you extract the array from an Excel.Range object, the array is 1-based, so keeping the Excel-oriented arrays 1-based helps you avoid needing to always check for whether the array is one-based or zero-based. (If the column position in the array has significance to you, having to deal with 0-based and 1-based arrays can be a real pain).
Generally reading the Excel.Range into an array would look something like this:
var myArray = (object[,])range.Value2;
My variation of Mike Rosenblum's array-write uses a 1-based array like this:
int[] lowerBounds = new int[]{ 1, 1 };
int[] lengths = new int[] { rowCount, columnCount };
var myArray =
(object[,])Array.CreateInstance(typeof(object), lengths, lowerBounds);
var dataRange = GetRangeFromMySources();
// this example is a bit too atomic; you probably want to disable
// screen updates and events a bit higher up in the call stack...
dataRange.Application.ScreenUpdating = false;
dataRange.Application.EnableEvents = false;
dataRange = dataRange.get_Resize(rowCount, columnCount);
dataRange.set_Value(Excel.XlRangeValueDataType.xlRangeValueDefault, myArray);
dataRange.Application.ScreenUpdating = true;
dataRange.Application.EnableEvents = true;
Upvotes: 4
Reputation: 23550
The fastest interface to Excel data is the C API. There are a number of products out there that link .NET to Excel using this interface.
2 products I like that do this are Excel DNA (which is free and open source) and Addin Express (which is a commercial product and has both the C API and COM interface available).
Upvotes: 3
Reputation: 52511
I've used VBA code (macro) to gather & compact the data, and get this data in one call to C#, and vice versa. This will probably be the most performant approach.
Using C#, you will always need to use some marshalling. Using VSTO or COM Interop, the underlaying communication layer (marshalling overhead) is the same.
In VBA (Visual Basic For Application) you work directly on the objects in Excel. So the access to this data will always be faster.
But.... Once you have the data in C#, the manipulation of this data can be a lot faster.
If you are using VB6 or C++, you also go through a COM interface, and you will also be facing cross process marshalling.
So you are looking for a method to minimize cross process calls and marshalling.
Upvotes: 0