Reputation: 3257
I have a table called 'Table1' which has two columns being 'Name' and 'Date'.
In Power Query Editor, I want to create a custom column called 'Last Date' to find the most recent date when the given Name appeared last time.
For instance, B firstly appeared on 5/7/2019, then on 11/8/2019, and last on 17/9/2019. So the Last Date for B when the Date is 11/8/2019 is 5/7/2019, and the Last Date for B when the Date is 17/9/2019 is 11/8/2019. Please see below example.
I did some research online but got confused with custom function, VAR, Measure, MAXX, CALCULATE, FILTER, etc...
I am not familiar with DAX nor Advanced DAX Editor so if possible please provide detailed answer or clear steps of how to apply your solution.
Let me know if I did not make myself clear. Otherwise your prompt help is greatly appreciated!!
Upvotes: 1
Views: 7824
Reputation: 4486
A custom function might look something like:
let
AddLastDateColumn = (someTable as table) as table =>
let
initialHeaders = Table.ColumnNames(someTable),
sorted = Table.Sort(someTable, {{"Date", Order.Ascending}, {"Name", Order.Ascending}}),
merged = Table.NestedJoin(sorted, {"Name"}, sorted, {"Name"}, "$joined", JoinKind.LeftOuter),
lastDateColumn = Table.AddColumn(merged, "Last Date", each
let
maxDate = [Date],
filtered = Table.SelectRows([#"$joined"], each [Date] < maxDate),
lastRow = if not Table.IsEmpty(filtered) then Table.Last(filtered)[Date] else null // Could use Table.Max, but data is already sorted.
in lastRow,
type nullable date),
dropColumns = Table.SelectColumns(lastDateColumn, initialHeaders & {"Last Date"})
in dropColumns
in
AddLastDateColumn
If you save the above as its own query, you can then access it in other queries. For example, if you save the above as a query named AddLastDateColumn
, you can then access
it in other queries (as below):
let
sourceTable =
let
nameColumn = {"A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E", "F", "G"},
dateColumn = {#date(2019,7,1), #date(2019,7,5), #date(2019,7,14), #date(2019,7,23), #date(2019,7,24), #date(2019,8,1), #date(2019,8,5), #date(2019,8,10), #date(2019,8,11), #date(2019,8,17), #date(2019,8,23), #date(2019,8,25), #date(2019,9,3), #date(2019,9,4), #date(2019,9,13), #date(2019,9,17), #date(2019,9,23), #date(2019,9,27), #date(2019,9,28), #date(2019,10,6), #date(2019,10,9)},
toTable = Table.FromColumns({nameColumn, dateColumn}, type table [Name = text, Date = date])
in toTable,
invokeFunction = AddLastDateColumn(sourceTable)
in
invokeFunction
I was looking at the documentation for Table.NestedJoin
(https://learn.microsoft.com/en-us/powerquery-m/table-nestedjoin). It seems to have a parameter named keyEqualityComparers
:
An optional set of
keyEqualityComparers
may be included to specify how to compare the key columns.
I don't have time to look into its capabilities and the syntax it requires, but perhaps it could be used to more elegantly specify the JOIN criteria: "Name
must match exactly. Right Date
must be the greatest date that is less than Left Date
".
In any case, I think the above function should do what you were after.
Upvotes: 1
Reputation: 3257
After being directed to the right direction by RADO, I searched a couple more similar Power Query cases using M language and built-in GUI, I have found a way to solve my question.
1) firstly sort my table ascending in the following orders: Name, Date;
2) add two index columns, one starts with 0 and the other starts with 1, and then merge the table itself matching index column 0 with index column 1;
3) expand the merged columns [Name.1] and [Date.1], and then add a custom column called "Last Date" with the following if function
if [Name]=[Name.1] then [Date.1] else null
4) Remove other columns just leave [Name], [Date] and [Last Date] column.
The above method is a bit tedious if I have a few tables that need the same operation. If anyone can provide a 'Custom Function' solution using a single query which can be applied to other tables that will be greatly appreciated!
Upvotes: 0
Reputation: 8148
Power Query and Power BI are two different tools.
Power Query is designed to process data. It uses language called "m". For example, if you need to import and merge files, fix bad data, etc - that's what you use.
Power BI is designed to analyze data. It uses language called "DAX". Most of the time, Power BI/DAX are useful to design interactive analytics - reports that respond to slicers, filters, etc.
Some people use DAX to enhance their data instead of Power Query - like adding a calculated column. Personally, I think it's a bad idea, but perhaps Power Query/m is too non-intuitive for them and DAX is easier. I will explain how to add a calculated column using DAX. If for some reason you prefer Power Query, please mention that in your question.
First, you must be in a main Power BI window, not in a Power Query window. Go to your data model, and select your table. On tab "model", click "add column". Enter DAX formula:
Last Date =
VAR Current_Date = Table1[Date]
VAR Current_Name = Table1[Name]
RETURN
CALCULATE( MAX(Table1[Date]),
Table1[Date] < Current_Date,
Table1[Name] = Current_Name )
The formula will generate a new column with the previous date for each Name.
How it works:
For example, for name "B", first highlight: it will first filter the table leaving only records where name = "B" (3 records). Then, it will further filter these 3 records to find dates < 11/08/2019, which is one record: 5/07/2019.
As a side note, I recommend to read at least one good book on Power BI/DAX, or take an online training class. This tool is not simple, and you will waste a lot of time if you try to learn it by trial and error.
Upvotes: 1