Scott Boston
Scott Boston

Reputation: 153500

Follow-up Dax Drop Duplicates

Similiar to a question asked here,

Given, this table, I want to only keep the records where the email appears first.

email firstname Lastname Address City Zip
[email protected] Scott Johnson A Z 1111
[email protected] Bill Johnson B Y 2222
[email protected] Ted Smith C X 3333
[email protected] Steve Williams D W 4444
[email protected] Sam Samford E U 5555
[email protected] David Beals F V 6666
[email protected] Stephen Jackson G T 7777
[email protected] Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>[email protected]</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Bill</td>
      <td>Johnson</td>
      <td>B</td>
      <td>Y</td>
      <td>2222</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Ted</td>
      <td>Smith</td>
      <td>C</td>
      <td>X</td>
      <td>3333</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>David</td>
      <td>Beals</td>
      <td>F</td>
      <td>V</td>
      <td>6666</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Stephen</td>
      <td>Jackson</td>
      <td>G</td>
      <td>T</td>
      <td>7777</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

Expected output table:

email firstname Lastname Address City Zip
[email protected] Scott Johnson A Z 1111
[email protected] Steve Williams D W 4444
[email protected] Sam Samford E U 5555
[email protected] Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>[email protected]</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>[email protected]</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

Upvotes: 1

Views: 84

Answers (2)

sergiom
sergiom

Reputation: 4887

Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.

T Index Unique = 
VAR Indexes =
    SELECTCOLUMNS(
        ALL( 'T Index'[Email] ),
        "MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
    )
RETURN
    CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )

Result

If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp

For instance with a T Date table like the following

T Date table

The calculated table becomes

T Date Unique = 
VAR EmailDate =
    ADDCOLUMNS(
        ALL( 'T Date'[Email] ),
        "MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
    )
RETURN
    CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )

T Date Unique

Upvotes: 1

Alexis Olson
Alexis Olson

Reputation: 40204

There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.

For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).

You can create a filtered table as follows:

FilteredTable1 =
FILTER (
    Table1,
    Table1[Index]
        = CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)

For each row in Table1, this checks if the index is minimal over all the rows with the same email.

Result

Upvotes: 2

Related Questions