How to remove unwanted row values and then merge rows based on a key in ADF/Azure SQL DB?

Question

I have a table that looks like this:

ID	Supplier Number	Supplier Name	Address	Postcode	City	State	First Name	Last Name
1	13	Example.com					Uwanted Data	Unwanted Data
1			15 Example St	9999	Brisbane	QLD
1	Unwanted Data	Uwanted Data					John	Doe
2	16	New Example Services					Uwanted Data	Unwanted Data
2			15 Test Drive	6789	Melbourne	VIC
2	Unwanted Data	Uwanted Data					Jane	Test

Where "Unwanted Data" refers to values that are not needed in the final dataset. What I am looking to do is to remove the "Unwanted Data" values and then merge the rows that have the same key such that we end up with the following table:

ID	Supplier Number	Supplier Name	Address	Postcode	City	State	First Name	Last Name
1	13	Example.com	15 Example St	9999	Brisbane	QLD	John	Doe
2	16	New Example Services	15 Test Drive	6789	Melbourne	VIC	Jane	Test

Is there any way to do this within Azure Data Factory? Alternatively, I can connect to an Azure SQL Database to run any SQL commands that me be able to do the work.

Many thanks in advance.

Edit: In some cases, the unwanted data may be of a similar type or value to the required value in the column. For example, in the column "First Name" there may be Joe and John for one particular ID. However, the desired value is in the exact same location relative to every ID. That is, the desired first name is in the third row of each ID.

NiharikaMoola · Accepted Answer

As @Nick.McDermaid, @Anand Sowmithiran mentioned in the comments, identify the unwanted data and you can replace unwanted data/uwanted data in each column with blank or NULL and apply max() function to get the aggregated values.

You can write a query directly in the Azure SQL database to get the expected results as below.

select id ID, 
  max(case when [Supplier Number] = 'Uwanted Data' or [Supplier Number] = 'Unwanted Data' then '' else [Supplier Number] end) [Supplier Number], 
  max(case when [Supplier Name]= 'Uwanted Data' or [Supplier Name] = 'Unwanted Data' then ''  else [Supplier Name] end) [Supplier Name],
  max(case when [Address] = 'Uwanted Data' or [Address] = 'Unwanted Data' then '' else [Address] end) [Address],
  max(case when [Postcode]= 'Uwanted Data' or [Postcode] = 'Unwanted Data' then '' else [Postcode] end) [Postcode],
  max(case when [City]= 'Uwanted Data' or [City] = 'Unwanted Data' then '' else [City] end) [City],
  max(case when [State]= 'Uwanted Data' or [State] = 'Unwanted Data' then '' else [State] end) [State],
  max(case when [First Name]= 'Uwanted Data' or [First Name] = 'Unwanted Data' then '' else [First Name] end) [First Name],
  max(case when [Last Name]= 'Uwanted Data' or [Last Name] = 'Unwanted Data' then '' else [Last Name] end) [Last Name]
from tb1
group by id

You use the same query in the Azure data factory copy activity source, by using the query option.

This gives your expected results only when there is a single correct value in a column per ID. It gives different results if there are more than 1 correct value in a column per ID value.

How to remove unwanted row values and then merge rows based on a key in ADF/Azure SQL DB?

Answers (1)

Related Questions