Reputation: 67
I am trying to find duplicate records in Power Query. I do not need to remove them, I would just need to identify them. I am putting up a sample here.
As you can see in the image below, the first record is actually not a duplicate unlike the row below. I am pretty new to using power query and all I did here was what I would generally do in EXCEL:
I was wondering if somebody could tell me what could I be doing wrong here. Thank you
Upvotes: 1
Views: 8630
Reputation: 21318
What you want to do is Right-click the ID column, Group By...
Click [Advanced]
It should default to Count Rows in the first row at bottom
[Add Aggregation] with New Column Name 'All" and operation 'All Rows'
Hit [OK]
Click on arrows at top of All column and [x] the ID column and any other columns that may have been there, here assumed to be called Other
The count column will be >1 for dupe rows.
Then you can remove extra columns and filter if wanted
Or add a custom column with formula
=if [Count]>1 then "Dupe" else "Single"
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"ID"}, {{"Count", each Table.RowCount(_), type number}, {"All", each _, type table}}),
#"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"ID", "Other"}, {"ID.1", "Other"})
in #"Expanded All"
Upvotes: 4