Aiden Krain
Aiden Krain

Reputation: 17

How to remove unwanted row values and then merge rows based on a key in ADF/Azure SQL DB?

I have a table that looks like this:

ID Supplier Number Supplier Name Address Postcode City State First Name Last Name
1 13 Example.com Uwanted Data Unwanted Data
1 15 Example St 9999 Brisbane QLD
1 Unwanted Data Uwanted Data John Doe
2 16 New Example Services Uwanted Data Unwanted Data
2 15 Test Drive 6789 Melbourne VIC
2 Unwanted Data Uwanted Data Jane Test

Where "Unwanted Data" refers to values that are not needed in the final dataset. What I am looking to do is to remove the "Unwanted Data" values and then merge the rows that have the same key such that we end up with the following table:

ID Supplier Number Supplier Name Address Postcode City State First Name Last Name
1 13 Example.com 15 Example St 9999 Brisbane QLD John Doe
2 16 New Example Services 15 Test Drive 6789 Melbourne VIC Jane Test

Is there any way to do this within Azure Data Factory? Alternatively, I can connect to an Azure SQL Database to run any SQL commands that me be able to do the work.

Many thanks in advance.

Edit: In some cases, the unwanted data may be of a similar type or value to the required value in the column. For example, in the column "First Name" there may be Joe and John for one particular ID. However, the desired value is in the exact same location relative to every ID. That is, the desired first name is in the third row of each ID.

Upvotes: 0

Views: 190

Answers (1)

NiharikaMoola
NiharikaMoola

Reputation: 5074

As @Nick.McDermaid, @Anand Sowmithiran mentioned in the comments, identify the unwanted data and you can replace unwanted data/uwanted data in each column with blank or NULL and apply max() function to get the aggregated values.

  1. You can write a query directly in the Azure SQL database to get the expected results as below.

    select id ID, 
      max(case when [Supplier Number] = 'Uwanted Data' or [Supplier Number] = 'Unwanted Data' then '' else [Supplier Number] end) [Supplier Number], 
      max(case when [Supplier Name]= 'Uwanted Data' or [Supplier Name] = 'Unwanted Data' then ''  else [Supplier Name] end) [Supplier Name],
      max(case when [Address] = 'Uwanted Data' or [Address] = 'Unwanted Data' then '' else [Address] end) [Address],
      max(case when [Postcode]= 'Uwanted Data' or [Postcode] = 'Unwanted Data' then '' else [Postcode] end) [Postcode],
      max(case when [City]= 'Uwanted Data' or [City] = 'Unwanted Data' then '' else [City] end) [City],
      max(case when [State]= 'Uwanted Data' or [State] = 'Unwanted Data' then '' else [State] end) [State],
      max(case when [First Name]= 'Uwanted Data' or [First Name] = 'Unwanted Data' then '' else [First Name] end) [First Name],
      max(case when [Last Name]= 'Uwanted Data' or [Last Name] = 'Unwanted Data' then '' else [Last Name] end) [Last Name]
    from tb1
    group by id
    

enter image description here

  1. You use the same query in the Azure data factory copy activity source, by using the query option.

enter image description here

This gives your expected results only when there is a single correct value in a column per ID. It gives different results if there are more than 1 correct value in a column per ID value.

Upvotes: 1

Related Questions