GuidoT
GuidoT

Reputation: 312

Selecting Distinct Values from sorted Table

I have data query in my excel workbook to load data from a CSV-file. I want to use the Power Query-Editor to transform the data before working with them actually.

The simplified structure of the CSV-file are as shown in the html-table snippet below:

<table style="text-align:center">
  <tr>
    <th style="width:100px">Response-ID</th>
    <th style="width:80px">Task-ID</th>
    <th style="width:200px">Executed Action</th>
    <th style="width:200px">Status</th>
    <th style="width:80px">Date</th>
  </tr>

  <tr>
    <td>1</td>
    <td>A</td>
    <td>Some Task to be done</td>
    <td>Open</td>
    <td>20.01.2020</td>
  </tr>

  <tr>
    <td>2</td>
    <td>A</td>
    <td>Another Action done today</td>
    <td>Open</td>
    <td>25.01.2020</td>
  </tr>

  <tr>
    <td>3</td>
    <td>B</td>
    <td>New Task</td>
    <td>Open</td>
    <td>25.01.2020</td>
  </tr>

  <tr>
    <td>4</td>
    <td>A</td>
    <td>More actions in this matter</td>
    <td><b>Done</b></td>
    <td>26.01.2020</td>
  </tr>

  <tr>
    <td>5</td>
    <td>C</td>
    <td>Another Task</td>
    <td>Open</td>
    <td>27.02.2020</td>
  </tr>

  <tr>
    <td>6</td>
    <td>B</td>
    <td>Whatever</td>
    <td><b>Done</b></td>
    <td>29.02.2020</td>
  </tr>

  <tr>
    <td>7</td>
    <td>D</td>
    <td>Now what?</td>
    <td>Open</td>
    <td>02.03.2020</td>
  </tr>

  <tr>
    <td>8</td>
    <td>C</td>
    <td>Almost done...</td>
    <td>Open</td>
    <td>02.03.2020</td>
  </tr>
</table>

What I want to achieve is a Power Query M code to display only the latest entry from each unfinished Task-ID.

Here is what I have so far:

let
    Source = ResponseData //a copy of my actual data query "ResponseData" to work with in this case
    #"Transformed Table" = Table.Distinct(Table.Buffer(Table.Sort(Source, {{"Date", Order.Descending}})), "Task-ID"),
    #"Remove Finished Tasks" = Table.SelectRows(#"Transformed Table", each ([Status] <> "Done" and [Status] <> "Canceled"))
in
    #"Remove Finished Tasks"

This code actually returns a list of all open tasks. But against my expectation it only contains the oldest response-ID of each task instead of the newest one. Changing the sort-order in the #"Transformed Table"-line returns the very same result in a reversed order. So basically sorting the table has no influence on the resulting data at all...

EDITED the code above according to the answer by Ron Rosenfeld -> Buffering the sorted table does the trick.

Upvotes: 0

Views: 267

Answers (1)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60174

For reasons I don't fully understand (seems to have something to do with query folding), Power Query Remove Duplicates when run on a certain data sources, apparently does not respect the sorting order. But if you use Table.Buffer, then it ensures that folding will not take place and that the Remove Duplicates will take place after the sorting/filtering.

Not sure which would be faster on large datasets, though -- Table.Buffer versus Table.Group

CSV File

Response-ID,Task-ID,Executed Action,Status,Date
1,A,Some Task to be done,Open,20.01.2020
2,A,Another Action done today,Open,25.01.2020
3,B,New Task,Open,25.01.2020
4,A,More actions in this matter,Done,26.01.2020
5,C,Another Task,Open,27.02.2020
6,B,Whatever,Done,29.02.2020
7,D,Now what?,Open,02.03.2020
8,C,Almost done...,Open,02.03.2020

First few lines of MCode is a bit different, but this should give you the gist:

let
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\test.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Response-ID", Int64.Type}, {"Task-ID", type text}, {"Executed Action", type text}, {"Status", type text}, {"Date", type text}}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Changed Type", {{"Date", type date}}, "en-150"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type with Locale", each ([Status] = "Open")),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Date", Order.Descending}}),
    #"Buffered Table" = Table.Buffer(#"Sorted Rows"),
    #"Removed Duplicates" = Table.Distinct(#"Buffered Table", {"Task-ID"}),
    #"Sorted Rows1" = Table.Sort(#"Removed Duplicates",{{"Task-ID", Order.Ascending}})
in
    #"Sorted Rows1"

Results

enter image description here

Upvotes: 2

Related Questions