Reputation: 123
I have data in GCS bucket and i want to create a new column row_number() and find a max of record from the original data. For example,below is my raw data.
ID MEMBER_ID SERVICE
3 234 xyz
4 234 abc
1 123 hyts
4 876 bts
10 876 xyz
and i want the output as below to my bigquery table.
ID MEMBER_ID SERVICE
4 234 abc
1 123 hyts
10 876 xyz
can you please suggest the possible way to do this in cloud data fusion.
Upvotes: 0
Views: 428
Reputation: 336
You can use the Deduplicate plugin (you can find it under the Analytics section in Pipeline Studio) to get the row with the max id for each member_id.
Upvotes: 1