Reputation: 397
Writing a C# Program where I am trying to figure out if something has a correct ID (ClVal). The data is being read from an Excel spreadsheet using Interop being transferred into a data table.
The Excel spreadsheet holds a list of named objects, who their owner is, and the ClVal of their owner.
So, let's say PeoplePower, INC. (a name I made up on the spot) owns three computers. Those three computers should all have the ClVal of 0000_Peop. What I need to do is make sure that if those computers are owned by PeoplePower, INC., that they have the correct ClVal.
One of my limitations is that I cannot simply compare a substring of the ClVal to the owner name. This is because if I take the substring "Peop" from the ClVal and look for it in the owner field, it would return true (exists) for anything with "Peop" somewhere in the name. I want to limit any false positives.
To add another layer of complexity, only the owners of a large number of machines are given their own unique ClVals. For everyone else, they are given a ClVal of "other". However, sometimes a machine of a large company is given the "other" ClVal.
There are basically three cases that we need to check, from an overall perspective:
If the computer has the correct ClVal (e.g a PeoplePower computer has the ClVal "0000_Peop"), we should assign that cell's value to 1. This helps operators identify machines with the correct ID at a glance, and allows us to enumerate correctly identified machines.
If the computer has an incorrect ClVal (i.e. a PeoplePower computer has the ClVal "Other"), we should assign that cell's value to 0. This helps operators identify "minor errors" - needs to be fixed but it can wait.
If the computer does not have a ClVal or it has the ClVal of another owner, the cells value should be a -1. This helps operators identify "major errors" that need to be fixed immediately.
So far, I have thought of one way to do this, but I wanted to know if there were any better/more efficient options. I have about 3500 rows of information to sort through currently, and that number is rising steadily, so I need a solution that will work with an even greater number of rows.
Idea:
string ClVal = Convert.ToString(((Excel.Range)excelStuff.xlWorksheet.Cells[rowIndex, 2]).Value2);
string name = Convert.ToString(((Excel.Range)excelStuff.xlWorksheet.Cells[rowIndex, 5]).Value2);
if (name.Contains("PeoplePower"))
{
string ProperClVal = "0000_Peop";
row[4] = testClVal(ClVal, ProperClVal);
}
//Repeat with else if for all of the major owners
else
{
if (ClVal == "Other")
{
row[4] = 1;
}
else
{
row[4] = -1;
}
}
//Outside of the while loop
private int testClVal(String reportedClVal, String ProperClVal)
{
if (reportedClVal == ProperClVal)
{
return 1;
}
else if (reportedClVal == "Other")
{
return 0;
}
else
{
return -1;
}
}
This is functional, but it's a bunch of if-then statements, and I've not gotten into the try-catch pieces for if my excel spreadsheet is malformed due to faulty data (which happens with some of the computers).
Is there a more efficient/better/faster way to get this done? With 3800 rows, it increases the load time for the datatable from 1.5 minutes to 2.5 -3 minutes.
Upvotes: 1
Views: 116
Reputation: 6103
Reading one value a time from excel is really slow. You should read all data in one step, like:
public object[,] GetArray(int topRow, int rows, int columns)
{
Range c1 = (Range)Worksheet.Cells[topRow + 1, 1];
Range c2 = (Range)Worksheet.Cells[topRow + 1 + rows - 1, columns];
Range range = Worksheet.get_Range(c1, c2);
return range.Value;
}
Upvotes: 0
Reputation: 5774
That's the fastest category of algorithm you'll be able to use for this kind of task, because your program needs to touch every row. So you're looking at having to optimize other parts of the program, or other patterns at large.
To use the terminology of Complexity Classes (https://en.wikipedia.org/wiki/Complexity_class) this is a O(n) task, meaning it'll take twice as long for 2000 entries as 1000 entries. The kind of operation you want to do is to examine every single value and make a decision that doesn't depend (very much) on how large that value is or how many values there are. In psuedocode:
1) for every Row in Sheet
2) lookup a fixed number of values
3) do a fixed number of comparisons
4) assign a result
5) end loop
Line 1 will be executed n
times. For each iteration through the loop you'll do a constant-cost set of operations; we'll say the cost is 1
. That means the total cost/complexity will be n
*1
=n
, which is why this is O(n).
To speed this up, I'd look at using something besides Excel interop (it's slower than NPOI https://npoi.codeplex.com/). Also see if you can refactor the problem: perhaps you don't really need to compute all the values at once but could "lazily-load" them https://en.wikipedia.org/wiki/Lazy_loading, or otherwise defer the calculation
Upvotes: 1
Reputation: 50251
How about
row[4] = ClVal == "0000_Peop" ? 1 : ClVal == "Other" ? 0 : -1
Also, there are a few stylistic things I'd like to mention.
string
, not String
everywhere. It is the accepted convention in C#.ProperClVal
, for example, looks like a class name to me.Upvotes: 1