Umut K
Umut K

Reputation: 1388

conditional remove duplicates

I have a list where i want to remove the duplicate where "Date of Termination (Work)" column is the latest or null.

sample data

Id No   Name    Surname Date of Employment (Work)   Date of Termination (Work)
12405   xxxx    yyy     10/26/2018                  2/6/2019
33418   mmm     nnnn    1/1/2018                    7/30/2018
33418   mmm     nnnn    1/13/2017                   12/31/2017
33616   rrrr    sssss   7/13/2018                   11/19/2018
33616   rrrr    sssss   7/13/2018                   null
48224   ttttt   kkkk    7/15/2018                   4/14/2019

the result should be

Id No   Name    Surname Date of Employment (Work)   Date of Termination (Work)
12405   xxxx    yyy     10/26/2018                  2/6/2019
33418   mmm     nnnn    1/1/2018                    7/30/2018
33616   rrrr    sssss   7/13/2018   
48224   ttttt   kkkk    7/15/2018                   4/14/2019

Upvotes: 0

Views: 131

Answers (3)

chillin
chillin

Reputation: 4486

I get expected output:

Output

using the code below:

let
    initialTable = Table.FromColumns({
        {12405, 33418, 33418, 33616, 33616, 48224},
        {"xxxx", "mmm", "mmm", "rrrr", "rrrr", "ttttt"},
        {"yyy", "nnnn", "nnnn", "sssss", "sssss", "kkkk"},
        {#date(2018, 10, 26), #date(2018, 01, 01), #date(2017, 1, 13), #date(2018, 7, 13), #date(2018, 7, 13), #date(2018, 7, 15)},
        {#date(2019, 02, 06), #date(2018, 7, 30), #date(2017, 12, 31), #date(2018, 11, 19), null, #date(2019, 4, 14)}
    }, type table [Id No = Int64.Type, Name = text, Surname = text, #"Date of Employment (Work)" = date, #"Date of Termination (Work)" = date]),
    nullElseMaxComparer = (x as record, y as record) =>
        let
            a = Record.Field(x, "Date of Termination (Work)"),
            b = Record.Field(y, "Date of Termination (Work)"),
            comparison = if a = null then 2 else if b = null then -2 else Value.Compare(a, b)
        in comparison,
    maxOrNullPerGroup = Table.Group(initialTable, "Id No", {"toCombine", each Table.Max(_, nullElseMaxComparer)}),
    combined = Table.FromRecords(maxOrNullPerGroup[toCombine])
in
    combined
  • If I understand correctly, for each ID No, you want the latest termination date.
  • In the expected output shown in the question and specifically for ID No 33616, null was preferred over 11/19/2018. So I've assumed that, for the purposes of this question, null is greater than any date.
  • nullElseMaxComparer is a custom comparison function which tries to give preference to null values -- and can be passed directly to Table.Max.

Upvotes: 1

Aleksei Zhigulin
Aleksei Zhigulin

Reputation: 1634

Another way:

= Table.Distinct(Table.Sort(YourTable, {"Date of Termination (Work)", 1}), {"Id No"})

enter image description here

Upvotes: 0

Umut K
Umut K

Reputation: 1388

i found a workaround (group by max) but i am sure there is an easier and faster way

let
    Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Id No", Int64.Type}, {"Name", type text}, {"Surname", type text}, {"Date of Employment (Work)", type datetime}, {"Date of Termination (Work)", type datetime}, {"Duty", type text}, {"Citizenship", type text}, {"National ID", type text}, {"Passport Serial/No", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Id No", Order.Ascending}, {"Date of Termination (Work)", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id No"}, {{"all", each _, type table [Id No=number, Name=text, Surname=text, #"Date of Employment (Work)"=datetime, #"Date of Termination (Work)"=datetime, Duty=text, Citizenship=text, National ID=text, #"Passport Serial/No"=text]}, {"maxdate", each List.Max([#"Date of Termination (Work)"]), type datetime}}),
    #"Expanded all" = Table.ExpandTableColumn(#"Grouped Rows", "all", {"Name", "Surname", "Date of Employment (Work)", "Date of Termination (Work)", "Duty", "Citizenship", "National ID", "Passport Serial/No"}, {"Name", "Surname", "Date of Employment (Work)", "Date of Termination (Work)", "Duty", "Citizenship", "National ID", "Passport Serial/No"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded all", each ([#"Date of Termination (Work)"] = [maxdate])  ),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"maxdate"})
in
    #"Removed Columns"

Upvotes: 0

Related Questions