Reputation: 605
I have the following problem in Azure Data Factory:
In a ADLS I have a CSV file with a linebreak in a value:
A, B, C
a, b, c
a, "b
b", c
This CSV is loaded in a (CSV) Dataset (in ADF) with the following settings; first row is header, quote character double quote ("), column delimiter Comma (,), row delimiter (\r,\n, or \r\n) and escape char backslash ().
The "Preview Data" of the dataset seems to work correct and outputs a table with 2 rows. This is also the output I expect, as the overal structure of the data is preserved.
However, when I try to use this dataset in Mapping Data Flow and select "Data Preview" (in the source node directly), I get the following output:
The linebreak isn't ignored, even as the whole value is between double quotes. The overal structure of the data is now broken as one row is splitted into two.
I get this kind of data when I save an Excel file with linebreaks in some cells as CSV. How should I work around this problem? Should I save the Excel differently, should I try to delete all linebreaks before saving as CSV, or is there a way to let Data Factory solve this problem? Als why does the Preview Data function in a Dataset seems to work correct, where the Data Preview function in Mapping Data Flow does not?
Upvotes: 6
Views: 7363
Reputation: 197
Anyone facing the same issue, now Microsoft has enabled the multiline feature in data flow. Now we can use csv files with multiline values
And if you still having issues check this one
Upvotes: 6
Reputation: 16411
I tried this and there is nothing different between Copy Active and Data Flow source settings.
I uploaded an csv file and changed the data to the same with you in my Blob storage.
Test.csv:
The result in Copy active settings and data preview:
The result in Data flow source data settings and data preview:
Please check your settings in your data flow.
Reference tutorial: Export Table to CSV, it also works for Azrue SQL database.
Update:
I asked Azure Support for help, they tested and have the same error.
Azure Support replied me and give the reason:
Hope this helps.
Upvotes: 3
Reputation: 3838
We are adding multi-line string handling to ADF Data Flows for text-delimited datasets. We'll have that feature enabled within a few iterations.
The workaround until then is to copy the data from your CSV to Parquet and then consume that Parquet dataset through your data flow source transformation.
Upvotes: 2