Roger
Roger

Reputation: 33

Need DAX formula to rank and eliminate duplicates

I have a pivot table that looks like this:

enter image description here

Problem: Donors don't add up to the grand total, like the giving does. This is because a donor's age range is determined for each and every gift, based on the date of that gift, and over the course of a year a donor can cross ranges.

Request: I would like a DAX formula to limit donors to one range. In an ideal world, it would be based on which age range they fall in for the majority of the year, but I would settle for just arbitrarily keeping one of them. I believe in SQL you could go about this by partitioning and ranking.

Sample of Underlying Table:

| Donation_ID | Donor_ID | Donation_Date | Amount | age at time of gift | summary_range |
|-------------|----------|---------------|--------|---------------------|---------------|
|           1 |      100 | 3/15/2017     |    400 |                  39 | <40           |
|           2 |      101 | 4/3/2017      |     50 |                  69 | 60-69         |
|           3 |      100 | 5/30/2017     |     15 |                  40 | 40-49         |
|           4 |      101 | 10/7/2017     |     20 |                  69 | 60-69         |
|           5 |      100 | 1/23/2018     |    220 |                  40 | 40-49         |
|           6 |      101 | 2/17/2018     |     25 |                  70 | 70+           |


UPDATE - I got the following code to work in DaxStudio. But then it failed in Excel, saying "summarizecolumns cannot have outside filter context." Per the footnote at the bottom of the following page, it is apparently just a limitation of Excel: https://www.sqlbi.com/articles/introducing-summarizecolumns/

EVALUATE(

// filter context of the pivot table EXCEPT no filter on age range
var fc = CALCULATETABLE(
    data_table,
    data_table[Donation_Date] >= date(2017,3,1),
    data_table[Donation_Date] <= date(2018,2,28)
)

var hh = SUMMARIZECOLUMNS(data_table[Donor_ID], data_table[summary_range],data_table[age at time of gift], fc)

var ranked = 
ADDCOLUMNS(
    hh,
    "RankByAge",
    RANKX (
        FILTER(
            SUMMARIZECOLUMNS(
                data_table[Donor_ID],data_table[age at time of gift],
                hh
            ),
            data_table[Donor_ID] = EARLIER(data_table[Donor_ID])
        ),
    data_table[age at time of gift],
    ,
    desc,
    DENSE
    )
)

return 

// ultimately need to count the rows rather than just return them
// the second criteria would come from the filter context in Excel
FILTER(ranked, [RankByAge] = 1 && [summary_range] = "<40" )

)

Upvotes: 0

Views: 1542

Answers (1)

Alexis Olson
Alexis Olson

Reputation: 40204

I think the simplest approach would be to create a couple of calculated columns to use instead.

Let's create a unique age for each donor by picking their age for their latest donation.

MaxAge =
    CALCULATE(
        MAX(data_table[ageattimeofgift]),
        ALLEXCEPT(data_table, data_table[Donor_ID])
    )

And then look up the range associated with this age.

MaxRange =
    LOOKUPVALUE(
        data_table[summary_range],
        data_table[ageattimeofgift],
        data_table[MaxAge]
    )

Use this instead of summary_range in your pivot table.

(Note: You can partition the MaxAge by fiscal year as well if you like so that a donor can move between groups when viewing years separately.)


A more dynamic approach would be to use a measure which can read in filter context.

Distinct Donors =
VAR CurrentRange =
    VALUES ( data_table[summary_range] )
VAR Summary =
    SUMMARIZE (
        ALLSELECTED ( data_table ),
        data_table[Donor_ID],
        "MaxAge", MAX ( data_table[ageattimeofgift] ),
        "Amount", SUM ( data_table[Amount] )
    )
VAR MaxRange =
    ADDCOLUMNS (
        Summary,
        "MaxRange",
        LOOKUPVALUE (
            data_table[summary_range],
            data_table[ageattimeofgift], [MaxAge]
        )
    )
RETURN
COUNTROWS(
  FILTER(
    MaxRange,
    CONTAINS(
        CurrentRange,
        [summary_range],
        [MaxRange]
    )
  )

)

Notice that I used SUMMARIZE instead of SUMMARIZECOLUMNS. Please refer to this article for more information on the differences and limitations of these functions in different environments.

You can use the following after the RETURN instead to get the corresponding amount given.

SUMX ( FILTER ( MaxRange, [MaxRange] IN CurrentRange ), [Amount] )

Note: This IN syntax above is a newer feature. Use the CONTAINS function for backward compatibility.

Upvotes: 1

Related Questions