Reputation: 477
I have a large set of files that contain line breaks within a column that are all wrapped in quotes, but U-SQL cannot process the files because it is seeing the \r\n
as the end of the row despite being wrapped in quotes.
Is there an easy way to fix these files other than opening each file up individually in something like notepad++? It seems there should be a way to ignore line breaks if they are contained within quotes.
Example is something like this:
1,200,400,"123 street","123 street,\r\nNew York, NY\r\nUnited States",\N,\N,200\r\n
Notepad++ works fine for finding and replacing values manually, but I'm trying to find a batch way to do this because I have multiple files (50+ per source table) and hundreds of thousands of records in each that I need to fix.
Upvotes: 1
Views: 1326
Reputation: 15621
According to U-SQL GitHub issue 84: USQL and embedded newline characters you can either build a custom extractor, or try to use the escapeCharacter
parameter of the built-in extractor:
USING Extractors.Csv(quoting : true, escapeCharacter : '\\') // quoting is true by default, but it does not hurt to repeat.
Upvotes: 1