Reputation: 132
what I'm trying to do is a simple sumif for about 200k lines of data which causes problems for excel. Basically my list looks like this
List of Companies Dummy1 Dummy2
Company A 0 1
Company A 0 1
Company A 1 1
Company B 1 1
Company B 0 1
Company B 0 1
....
and if there is a 1 in any row of column B for a specific company I need to plug a 1 in each row of column C for this company.
So Dummy 2 is basically the sum over Dummy 1 for all entries for a specific company.
The data is already sorted by column A.
Anyway, Excel goes crazy.
Is it just plain stupid what I'm doing here because I'm generating too many comparative operations?
What would be an easy way to accomplish what I'm trying to do here?
Upvotes: 0
Views: 7023
Reputation:
According to your sample data, filling C2:C200000 with,
=SUMIF(A:A, A2, B:B)
... will be performing 3× as many SUMIF
calculations as is necessary. An IF
formula only processes the part that is TRUE or FALSE depending on how the criteria resolves so changing the formula to something like the following,
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)
... should drastically reduce the processing in a calculation cycle. The degree of improvement will depend upon how many duplicate company values are in column A and whether column A has been sorted to keep the company names together. The smaller the number of unique companies, the more improvement you will see. In short, unless the company changes from row to row, the SUMIF
is not calculated.
Sample Calculation Timing Environment:
Test 1: 26 companies (Company A to Company Z), each with ~7683 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:21:44
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:09
Test 2: 5000 companies (Company 0001 to Company 5000), each with ~40 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:22:10
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:37
You cannot magically break the physical laws of time and space but sometimes you can fool them. This solution may not be perfect but perhaps it is something that you can live with.
On a related note, large(r) worksheets benefit from having their formulas reverted to result values once calculations have been made if those results are not likely to change on a regular basis. While Copy, Paste Special, Values is a reasonably quick method of accomplishing this, selecting a large number of cells containing formulas and running the following sub macro is lightning quick.
sub sel_2_Value
application.enableevents = false
selection = selection.value
application.enableevents = true
end sub
If locale differences are not important (currency, dates, etc) then selection = selection.value2
is even better.
The only thing that will slow down the above operation is formulas with dependents within the range being reverted to values as they will be recalculated.
Upvotes: 1
Reputation: 97
I think the better way to solve this is by ussing pivot table, you can sum Dummy1 by company and get the data as summary.
Here is an examples:
http://www.excel-easy.com/data-analysis/pivot-tables.html
I hope this help
Upvotes: 1