Reputation: 312
I have data query in my excel workbook to load data from a CSV-file. I want to use the Power Query-Editor to transform the data before working with them actually.
The simplified structure of the CSV-file are as shown in the html-table snippet below:
<table style="text-align:center">
<tr>
<th style="width:100px">Response-ID</th>
<th style="width:80px">Task-ID</th>
<th style="width:200px">Executed Action</th>
<th style="width:200px">Status</th>
<th style="width:80px">Date</th>
</tr>
<tr>
<td>1</td>
<td>A</td>
<td>Some Task to be done</td>
<td>Open</td>
<td>20.01.2020</td>
</tr>
<tr>
<td>2</td>
<td>A</td>
<td>Another Action done today</td>
<td>Open</td>
<td>25.01.2020</td>
</tr>
<tr>
<td>3</td>
<td>B</td>
<td>New Task</td>
<td>Open</td>
<td>25.01.2020</td>
</tr>
<tr>
<td>4</td>
<td>A</td>
<td>More actions in this matter</td>
<td><b>Done</b></td>
<td>26.01.2020</td>
</tr>
<tr>
<td>5</td>
<td>C</td>
<td>Another Task</td>
<td>Open</td>
<td>27.02.2020</td>
</tr>
<tr>
<td>6</td>
<td>B</td>
<td>Whatever</td>
<td><b>Done</b></td>
<td>29.02.2020</td>
</tr>
<tr>
<td>7</td>
<td>D</td>
<td>Now what?</td>
<td>Open</td>
<td>02.03.2020</td>
</tr>
<tr>
<td>8</td>
<td>C</td>
<td>Almost done...</td>
<td>Open</td>
<td>02.03.2020</td>
</tr>
</table>
What I want to achieve is a Power Query M code to display only the latest entry from each unfinished Task-ID.
Here is what I have so far:
let
Source = ResponseData //a copy of my actual data query "ResponseData" to work with in this case
#"Transformed Table" = Table.Distinct(Table.Buffer(Table.Sort(Source, {{"Date", Order.Descending}})), "Task-ID"),
#"Remove Finished Tasks" = Table.SelectRows(#"Transformed Table", each ([Status] <> "Done" and [Status] <> "Canceled"))
in
#"Remove Finished Tasks"
This code actually returns a list of all open tasks. But against my expectation it only contains the oldest response-ID of each task instead of the newest one. Changing the sort-order in the #"Transformed Table"
-line returns the very same result in a reversed order. So basically sorting the table has no influence on the resulting data at all...
EDITED the code above according to the answer by Ron Rosenfeld -> Buffering the sorted table does the trick.
Upvotes: 0
Views: 267
Reputation: 60174
For reasons I don't fully understand (seems to have something to do with query folding), Power Query Remove Duplicates
when run on a certain data sources, apparently does not respect the sorting order. But if you use Table.Buffer
, then it ensures that folding will not take place and that the Remove Duplicates
will take place after the sorting/filtering.
Not sure which would be faster on large datasets, though -- Table.Buffer
versus Table.Group
CSV File
Response-ID,Task-ID,Executed Action,Status,Date
1,A,Some Task to be done,Open,20.01.2020
2,A,Another Action done today,Open,25.01.2020
3,B,New Task,Open,25.01.2020
4,A,More actions in this matter,Done,26.01.2020
5,C,Another Task,Open,27.02.2020
6,B,Whatever,Done,29.02.2020
7,D,Now what?,Open,02.03.2020
8,C,Almost done...,Open,02.03.2020
First few lines of MCode is a bit different, but this should give you the gist:
let
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\test.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Response-ID", Int64.Type}, {"Task-ID", type text}, {"Executed Action", type text}, {"Status", type text}, {"Date", type text}}),
#"Changed Type with Locale" = Table.TransformColumnTypes(#"Changed Type", {{"Date", type date}}, "en-150"),
#"Filtered Rows" = Table.SelectRows(#"Changed Type with Locale", each ([Status] = "Open")),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Date", Order.Descending}}),
#"Buffered Table" = Table.Buffer(#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffered Table", {"Task-ID"}),
#"Sorted Rows1" = Table.Sort(#"Removed Duplicates",{{"Task-ID", Order.Ascending}})
in
#"Sorted Rows1"
Results
Upvotes: 2