Zebediah Jenkins
Zebediah Jenkins

Reputation: 21

Azure Data Factory Not Properly Processing Delimited File With Escape Characters

I'm trying to load files from SAP ECC using Azure Data Factory, so I have a Data Flow and a pipeline that executes said Data Flow. Everything goes smoothly until I try to load the file into a SQL Server table because there is schema drift occurring from ECC into Blob Storage. When I run the pipeline I receive an error saying the columnset I'm trying to load is greater than the columnset in the destination table.

It's been suggested by a co-worker to just add a few extra columns to the destination, but this would need to be implemented on 70+ more tables and stored procedures. I'd like to just handle it in the pipeline.

The file type is txt but it is a comma-separated format. Each column is enclosed in double quotes. Here is an example of the data causing an issue:

"2358UD ECO LINE 40",25NM,RED"

Here are the dataset settings I'm using:

Dataset Settings

I suppose there should be a backslash before the comma as well but there isn't. I'm at a loss for how to ingest this data.

I am happy to provide more details as well. Any help would be greatly appreciated!

Upvotes: 0

Views: 105

Answers (1)

Aswin
Aswin

Reputation: 7116

Your data has quote character and delimiter itself. You can ignore the delimiter within the data by defining the quote character. But ignoring the quote within the data is not possible. Since the issue is in data, you need to change the data manually. Workaround is to copy the data from SAP ECC to parquet file in blob storage and then use that file as a source to copy to SQL server.

  • I tried to copy the same sample data to csv file and parquet file. Below is the output.

Sample data:

"2358UD ECO LINE 40",25NM,RED","col2,col1"
"data1","data2"
"dat3","data4"

Data preview of csv: Data is split into multiple columns.

Data preview of parquet: Data is copied into parquet file as in source.

Upvotes: 0

Related Questions