Michael H.
Michael H.

Reputation: 605

Mapping Data Flow in Data Factory doesn't ignore line-breaks in text-values of CSV files

I have the following problem in Azure Data Factory:

In a ADLS I have a CSV file with a linebreak in a value:

A, B, C
a, b, c
a, "b
b", c

This CSV is loaded in a (CSV) Dataset (in ADF) with the following settings; first row is header, quote character double quote ("), column delimiter Comma (,), row delimiter (\r,\n, or \r\n) and escape char backslash ().

The "Preview Data" of the dataset seems to work correct and outputs a table with 2 rows. This is also the output I expect, as the overal structure of the data is preserved. enter image description here

However, when I try to use this dataset in Mapping Data Flow and select "Data Preview" (in the source node directly), I get the following output: enter image description here

The linebreak isn't ignored, even as the whole value is between double quotes. The overal structure of the data is now broken as one row is splitted into two.

I get this kind of data when I save an Excel file with linebreaks in some cells as CSV. How should I work around this problem? Should I save the Excel differently, should I try to delete all linebreaks before saving as CSV, or is there a way to let Data Factory solve this problem? Als why does the Preview Data function in a Dataset seems to work correct, where the Data Preview function in Mapping Data Flow does not?

Upvotes: 6

Views: 7363

Answers (3)

sav
sav

Reputation: 197

Anyone facing the same issue, now Microsoft has enabled the multiline feature in data flow. Now we can use csv files with multiline values

enter image description here

And if you still having issues check this one

https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/data-factory/data-flow-troubleshoot-guide.md#error-code-df-executor-systeminvalidjson-1

Upvotes: 6

Leon Yue
Leon Yue

Reputation: 16411

I tried this and there is nothing different between Copy Active and Data Flow source settings.

I uploaded an csv file and changed the data to the same with you in my Blob storage.

Test.csv:

enter image description here

The result in Copy active settings and data preview: enter image description here

The result in Data flow source data settings and data preview: enter image description here enter image description here

Please check your settings in your data flow.

Reference tutorial: Export Table to CSV, it also works for Azrue SQL database.

Update:

I asked Azure Support for help, they tested and have the same error.

Azure Support replied me and give the reason: enter image description here

enter image description here

Hope this helps.

Upvotes: 3

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

We are adding multi-line string handling to ADF Data Flows for text-delimited datasets. We'll have that feature enabled within a few iterations.

The workaround until then is to copy the data from your CSV to Parquet and then consume that Parquet dataset through your data flow source transformation.

Upvotes: 2

Related Questions