Enjay
Enjay

Reputation: 13

How do I group rows based on a fixed sum of values in Excel?

I am trying to find another solution to below Excel formula that was already provided here:

How do I create groups based on the sum of values?

It is the same requirement, but the grouping criteria needs to be an exact value.

Here's the sample data:

Column A | Column B
Item A | 1
Item B | 2
Item C | 3
Item D | 4
Item E | 5
Item F | 1
Item G | 2
Item H | 3
Item I | 4
Item J | 5

I need to group the rows if their Column B sum = 5.

Expected result:

Group 1 = Item A, Item D (1 + 4) = 5
Group 2 = Item B, Item C (2 + 3) = 5
Group 3 = Item E = 5
Group 4 = Item F, Item I (1 + 4) = 5
Group 5 = Item G, Item H (2 + 3) = 5
Group 6 = Item J = 5

If a row's Column B exceeds 5 or does not have another matching row to equal 5 when added then it will have no Group value.

Groupings can be interchangeable, ie. Group 1 = Item A, Item I can be made since 1 + 4 = 5.

I assume this can be achieved using Excel formulas but I am struggling to find which formula(s) can be used. Any help is appreciated!

Upvotes: 0

Views: 735

Answers (1)

David Leal
David Leal

Reputation: 6759

I believe I was able to understand your question after some comments exchanged. Anyway I would recommend to update your question, it is an interesting problem, but the question was difficult to follow.

Before looking for an Excel solution, I took the approach of understanding the problem as a state machine with the transition from one state to another. I considered the following states that represent the position the item in the group. A group is defined as consecutive items that the sum of all items is equal to 5.

  1. EMPTY: Just the initial situation
  2. START: Start of the group
  3. MIDDLE: A middle element of the group
  4. END: The end of the group
  5. START-END: A group with a single element
  6. NA: Not applicable group

I follow the same idea of: How do I create groups based on the sum of values?, but slightly different helper columns:

  1. Total (Column D), but for this case it is used the following formula: IF(SUM(C3,D2)>5,C3,SUM(C3,D2))

  2. Status or item position within Group (Column G). Here is where it is calculated the corresponding status for each element

  3. Checks for Valid Groups (Column H): Evaluates if a group is valid. When there is no match to 5, the group is not valid. It is indicated at the row that represents the beginning of the group (START or START-END states). If TRUE it means a valid group, if FALSE it is not a valid group, and NA for an NA value from Status column. If empty represents any element of the group that is not the first one.

  4. Group # (Column I): To identify the group the row (Item) belongs to. Notice that we start counting the group from 1 and I also consider the case a group can not be formed (NA).

Here is a screenshot with the solution and the formula on G3:

=LET(total, D3, prevS, G2, QTY, C3,
    IF(C3="", "",
        IF(OR(AND(total=5, QTY<5, prevS="START"), AND(total=5, prevS="MIDDLE")), "END",
            IF(OR(AND(total>5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total>5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "NA",
                IF(OR(AND(total<5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total<5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START", 
                    IF(AND(total<5, OR(prevS="START", prevS="MIDDLE")), "MIDDLE",
                        IF(OR(AND(total=5, total= QTY, OR(prevS="START", prevS="MIDDLE")),AND(total=5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START-END", "UNDEFINED")
                    )
                )
            )
        )
    )
)

Notes::

  1. LET Excel function is used to have something more readable

  2. The IF blocks should to be ordered from the most specific case of total and QTY values to the most generic ones. For the case with same total condition, make sure the second condition for prevS are not repeated.

  3. Added as a last resort UNDEFINED case, to check if any transition was not covered, if that is the case it has to be reviewed, so far in the sample data all cases are covered

sample excel file

Column K-Q is just for documenting purpose to identify all possible transitions. Column K-M provides all possible transitions organized them by previous status. The columns O-Q represent all possible transitions ordered by current status, so it is easier to formulate each portion of the IF blocks.

Maybe the formula can be simplified, compared to the solution provided by the similar question is more complex, but this question has more specific conditions. Some transitions maybe not relevant for the final result, but it is preferred to consider all positions in the group to make sure all transitions are covered.

The following state machine diagram shows all possible transitions:

state machine diagram

Notes:

As you can see the solution also considers when a group cannot be created or non valid groups (NA values). The solution considers that Item column has only positive values, it is not stated in the question any restriction, but looking at the example they are all positives. To consider zero values, this solution needs to be adjusted.

Checks for Valid Groups column is calculated as follow:

= IF(G3="", "", 
    IF(G3="START-END", TRUE,
        IF(G3="NA", "NA",
            IF(G3="START",
                LET(endRow, IFNA(MATCH("START", LEFT(G4:$G$1000,5),0), MATCH("", LEFT(G4:$G$1000,5),0))+ ROW()-1, 
                    value, VLOOKUP("END", G4:INDIRECT( "G" & endRow),1,0), 
                    IF(ISNA(value), FALSE, TRUE)
                ), ""
            )
        )
    )
)

It identifies the start and end of the group, and then finds any NA values, if there are, then it is not a valid group. If the end of the candidate group is not found (the first MATCH returns N/A), then is searches until a blank row

Group # column is calculated has follow:

=IF(C3="","", LET(value, MAX($I$2:I2), IF(G3="NA", "NA", 
  IF(H3=TRUE, value + 1, IF(H3=FALSE, "NA", 
  IF(I2="NA", "NA", value))))))

This way only valid transaction are considered, i.e. the following status transitions starting from START but not ending in END : START->NA, START->MIDDLE[one or more]...->NA and NA are not considered valid groups (NA).

I added more examples from the original sample file provided, more can be added to further test all possible scenarios, but I guess you get the idea about this approach. As you sated "I assume this can be achieved using Excel formulas" yes it is possible, but I would say for more complex conditions I would suggest to implement a state machine algorithm in VBA. Even it is possible to do it with Excel functions, you have to deal with several nested IF blocks and helper columns, something that can be achieved with a simple for-loop in VBA.

Here is a link to online Excel file I used.

Upvotes: 1

Related Questions