ZCoder
ZCoder

Reputation: 2339

Azure Data Flow filter distinct rows

What I want to achieve is that I have sources which sending me some data, but before saving that data in sink I want to filter that distinct with respect to columns I am not able to find Distinct function in expression functions. Can anyone tell me how to achieve this

Upvotes: 1

Views: 13220

Answers (2)

Balint Bako
Balint Bako

Reputation: 2560

This can be done by manually editing the script (and then linking it together on the UI). The following snippet does a distinct filtering using all columns:

aggregate(groupBy(mycols = sha2(256,columns())),
    each(match(true()), $$ = first($$))) ~> DistinctRows

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-script#distinct-row-using-all-columns

Upvotes: 2

AM07300222
AM07300222

Reputation: 121

Not sure if you still have this problem, I suggest to use the 'Aggregate' component in dataflow, I did a test like below:

enter image description here

in 'Aggregate Settings' we define all the 'Group by' columns and 'Aggregates' columns, the source table have 9 columns in total, and 900 rows in total containing 450 distinct rows plus 450 duplicated rows.

enter image description here

I use max to aggregate the 'ModifiedDate' column, and in sink table there's only 450 distinct rows.

Upvotes: 4

Related Questions