Reputation: 2548
I'm trying to understand some legacy Excel file (it works, but I would really like to understand how/why it's working).
There is a sheet for data input (input sheet
)and some code that is called to process data in the input sheet
. I found out that number of rows in the input sheet
is determined using a Lookup formula like this:
=LOOKUP(2;1/('Input sheet'!E1:E52863<>"");ROW(A:A))
According to what I have found about Lookup behaviour: •If the LOOKUP function can not find an exact match, it chooses the largest value in the lookup_range that is less than or equal to the value.
What does this ^-1 operation to the specified range? If E(x) is not empty -> it should turn into 1, but if it is empty - then it would be 1/0 -> that should produce #DIV/0!
error...
1/('Input sheet'!E1:E52863<>"")
The outcome is the same, if I replace 2 with any positive number (ok, tried only some, but it looks like this is the case). If I change lookup value to 0, then I get #N/A error -> •If the value is smaller than all of the values in the lookup_range, then the LOOKUP function will return #N/A
I am stuck... can anyone shed some light?
Upvotes: 3
Views: 800
Reputation: 59475
You might get some idea what the formula is doing (or any other formula) if you apply Evaluate Formula. Though since the principle is the same whether 3 rows or 52863 I'd suggest limiting the range, to speed things up if choosing Evaluate Formula. As usual with trying to explain formulae, it is best to start from the inside and work outwards. This:
'Input Sheet'!E1:E52863<>""
returns an array with a result for every entry in ColumnE from Row1 to Row52863. Since it is a comparison (<>
does not equal) the result is Boolean - ie TRUE (not empty) or FALSE (is empty). So if only the first half of E1 to E52863 is populated, the result is {TRUE;TRUE;TRUE
; ... and a LOT more TRUE;
... and FALSE
... and a LOT more ;FALSE
and finally }
.
Working outwards, the next step is to divide this array into 1
. In arithmetic operations Boolean TRUE is treated as 1
and FALSE as 0
, so the resultant array is {1;1;1;
... and a LOT more 1;
... and #DIV/0!
... and a LOT more ;#DIV/0!
and finally }
.
This then becomes the lookup_vector within which LOOKUP seeks the lookup_value. The lookup_value you show is 2
. But the array comprises either 1
or #DIV/0!
- so 2
will never be found in it. As you have noticed, that 2
could just as well be 3
, or 45
or 123
- anything as long as not a value present in the array.
That (not present) is necessary because LOOKUP stops searching when it finds a match. The fact that there is no match forces it to the end of the (valid) possibilities - ie the last 1
. At this point, in my opinion, it would be logical to return "not found" but - I suspect merely a quirk, though very convenient - it returns that 1
- by its index number in the list, ie 52863 if all cells in E1:E52863 are populated.
Although the result_vector (Row(A:A)
) is optional for LOOKUP it is required in this usage in effect to fix the start point for the index (effectively Row1, since an entire column). You might change that to say A3:A..
and the result would be the number of the highest populated row number in ColumnE plus 2 (3
-1
).
Upvotes: 2
Reputation: 14764
LOOKUP
has the rare ability to ignore errors. Conducting the 1/n
operation will produce an error every time n
is zero. False is the same as zero. So, for your formula, every empty cell produces an error in this calculation. All of those results are put in a vector array in the 2nd argument.
Searching for any positive value (the 1st argument) larger than 1
will result in LOOKUP finding the last non-error value in the above vector.
It also has the nice optional 3rd argument where you can specify the vector of results from which to return the lookup value. This is similar to the INDEX component of the the INDEX/MATCH combo.
In the case of your formula, the 3rd argument is an array that looks like this: {1;2;3;4;5;6;7;8;9;...n} where n is the last row number of the worksheet, which in modern versions of Excel is 1048576.
So LOOKUP returns the value from the vector in the 3rd argument that corresponds to the last non-error (non-blank cell) in the 2nd argument.
Note that this method of determining the last row will ignore cells that have formulas that result in a zero-length string. Such cells look blank but of course they are not. Depending on the situation, this may be precisely what you want. If, on the other hand you want to find the last row in column E that has a formula in it even if it results in a zero-length string, then this will do that:
=MATCH("";'Input sheet'!E:E;)
Upvotes: 4