Reputation: 2339
What I want to achieve is that I have sources which sending me some data, but before saving that data in sink I want to filter that distinct with respect to columns I am not able to find Distinct function in expression functions. Can anyone tell me how to achieve this
Upvotes: 1
Views: 13220
Reputation: 2560
This can be done by manually editing the script (and then linking it together on the UI). The following snippet does a distinct filtering using all columns:
aggregate(groupBy(mycols = sha2(256,columns())),
each(match(true()), $$ = first($$))) ~> DistinctRows
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-script#distinct-row-using-all-columns
Upvotes: 2
Reputation: 121
Not sure if you still have this problem, I suggest to use the 'Aggregate' component in dataflow, I did a test like below:
in 'Aggregate Settings' we define all the 'Group by' columns and 'Aggregates' columns, the source table have 9 columns in total, and 900 rows in total containing 450 distinct rows plus 450 duplicated rows.
I use max to aggregate the 'ModifiedDate' column, and in sink table there's only 450 distinct rows.
Upvotes: 4