Reputation: 57

Importing Excel With Duplicate Column Names into Azure SQL Using Azure Data Factory

I am quite new to using Azure so pardon me for this basic question. I am trying to import a scheduled Excel file that is sent everyday to my email address into my Azure SQL database. So far, I have managed to use Azure Logic App to save the file into a blob. I intend to use Azure Data Factory to copy the blob into my Azure SQL Database (as after I import in, I will be using ADF to do incremental load).

However, my Excel is generally in this table format (this is system generated so I have no control to edit the source). The problem comes when my source has duplicated column headings such as 'Name', but the data itself is not duplicated; they are representing different things. Eg. 'Name' can refer to Country and another column 'Name' refers to Continent. Note: I am able to differentiate it on my end to know which column index to match to which in my database.

Sample Excel File

What I have done : I have created a dataset in ADF without first row as header (as it will cause problems) and then created a dataflow to filter out the column headers before mapping my source to my sink successfully.

What I am looking for is perhaps is there a more elegant solution to this? I have read that it is possible to change the column header names of my source using import schema when creating the dataset. But when I tried to import an excel as the schema with the column headers I want, this error comes out (Assuming I am trying to import one of the 'Names' as 'Country').

ErrorCode=ExcelInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Invalid column name 'Country' is specified while read data from worksheet 'Page 1'.,Source=Microsoft.DataTransfer.ClientLibrary,'

Update: Here's the screenshot of the copy data activity where the mapping source column is empty and I have to individually select several duplicated column names

Duplicated Column

Import Schema Dataset

Upvotes: 0

Answers (2)

Jochen Juelke

Reputation: 11

to add on this regarding copyactivity-behaviour with duplitate column names in source: also the "mapping" section numbers the duplicate columns (starting with 0) and also the datset preview does so.. you CAN'T use "auto-create table" in SINK of copy activity because strangely the generater for auto-create table seems to apply not the "numbering-logic" for dupliate column-names

Upvotes: 0

Jason Welch

Reputation: 986

You can just use a Copy Data activity.

Let it pull in the first row with the headers (I made my csv have several columns called thing).

Then on the mapping tab of Copy Data, click Import Schemas

It will assign unique names to your duplicate column headings, and you can over-type the default output column names like this ...

Upvotes: 0

Importing Excel With Duplicate Column Names into Azure SQL Using Azure Data Factory

Answers (2)

Related Questions