Reputation: 3736
I have the following data in my source:
I want to filter the data such that output will only include data with most recent date. For example: DEF has 2 values 2.5, 3-Jan and 4, 4-Jan. I want rows with most recent date (4, 4-Jan) so that the output will contain the following result. How do I do that in data flow?
Upvotes: 0
Views: 2360
Reputation: 8660
I tried it successfully.Please follow the steps.
First step:
create a source
(I used csv file with the data you provided),RecentDate
column should be like '01-04-2020' rather than '04-Jan' because '04-Jan' can't be sorted.
The setting of Projection see the picture.
Second step:
create an aggregate
, please choose 'Movie' column in the Group By
configuration, and the setting of Aggregates
please see the image.
Third step:
create a new branch
(click '+' at the bottom right corner of source)
Fourth step:
create a join
and setting please see the image.
The last step:
create a select
and remove two duplicate column('Movie' and 'RecentDate') then output the sink
.
Hope these can help you.
Below is all images:
Update Answer:
Below is how to output only max rate:
This is my test data:
Movie,MaxRate,RecentDate
ABC,3,02-01-2020
DEF,2.5,03-02-2020
DEF,4,04-01-2020
DEF,6,03-02-2020
This is total flow image:
First step:create a source.The setting of Projection like this
Second step: create an aggregate, please choose 'Movie' and 'RecentDate' column in the Group By configuration, and the setting of Aggregates is below:
Third step: create an aggregate, please choose 'Movie' column in the Group By configuration, and the setting of Aggregates is below:
Fourth step: create a join and setting please see the image
The last step: create a select and remove two duplicate column('Movie' and 'RecentDate') then name 'Rate' as 'MaxRate',finally output the sink.
Below is output:
Movie,MaxRate,RecentDate
ABC,3,02-01-2020
DEF,6,03-02-2020
If you want to output the min rate,just change the max($$)
to min($$)
in Second step.
If you want to output the both max and min rate, please do like above(max rate flow) until creatingselect
finish and then New branch
do min rate flow until creating select
finish,finally join two select and delete duplicate column.
If you have another question,please let me know.
Upvotes: 4