Thomas
Thomas

Reputation: 132

SumIf with lots of data in Excel

what I'm trying to do is a simple sumif for about 200k lines of data which causes problems for excel. Basically my list looks like this

List of Companies    Dummy1  Dummy2  
Company A             0       1
Company A             0       1
Company A             1       1 
Company B             1       1
Company B             0       1
Company B             0       1
....

and if there is a 1 in any row of column B for a specific company I need to plug a 1 in each row of column C for this company.
So Dummy 2 is basically the sum over Dummy 1 for all entries for a specific company. The data is already sorted by column A. Anyway, Excel goes crazy. Is it just plain stupid what I'm doing here because I'm generating too many comparative operations? What would be an easy way to accomplish what I'm trying to do here?

Upvotes: 0

Views: 7023

Answers (2)

user4039065
user4039065

Reputation:

According to your sample data, filling C2:C200000 with,

=SUMIF(A:A, A2, B:B)

... will be performing 3× as many SUMIF calculations as is necessary. An IF formula only processes the part that is TRUE or FALSE depending on how the criteria resolves so changing the formula to something like the following,

=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)

... should drastically reduce the processing in a calculation cycle. The degree of improvement will depend upon how many duplicate company values are in column A and whether column A has been sorted to keep the company names together. The smaller the number of unique companies, the more improvement you will see. In short, unless the company changes from row to row, the SUMIF is not calculated.

Sample Calculation Timing Environment:

  • Excel 2010 64-bit (14.0.7015.1000) running under Windows 7 Pro on a business class i5 laptop w/8Gbs DRAM.
  • XLSB; Calculation Manual; Recalculate workbook before saving OFF; Save AutoRecovery information OFF

Test 1: 26 companies (Company A to Company Z), each with ~7683 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.

 formula                                        calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B)                                    00:21:44
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)                    00:00:09

Test 2: 5000 companies (Company 0001 to Company 5000), each with ~40 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.

 formula                                        calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B)                                    00:22:10
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)                    00:00:37

      Elapsed time for calculation

You cannot magically break the physical laws of time and space but sometimes you can fool them. This solution may not be perfect but perhaps it is something that you can live with.

On a related note, large(r) worksheets benefit from having their formulas reverted to result values once calculations have been made if those results are not likely to change on a regular basis. While Copy, Paste Special, Values is a reasonably quick method of accomplishing this, selecting a large number of cells containing formulas and running the following sub macro is lightning quick.

sub sel_2_Value
    application.enableevents = false
    selection = selection.value
    application.enableevents = true
end sub

If locale differences are not important (currency, dates, etc) then selection = selection.value2 is even better.

The only thing that will slow down the above operation is formulas with dependents within the range being reverted to values as they will be recalculated.

Upvotes: 1

Issac Pe&#241;a
Issac Pe&#241;a

Reputation: 97

I think the better way to solve this is by ussing pivot table, you can sum Dummy1 by company and get the data as summary.

Here is an examples:

http://www.excel-easy.com/data-analysis/pivot-tables.html

enter link description here

I hope this help

Upvotes: 1

Related Questions