CodeMonkey
CodeMonkey

Reputation: 477

How to remove \r\n line breaks in a text file that are within quotes and not the end of the row

I have a large set of files that contain line breaks within a column that are all wrapped in quotes, but U-SQL cannot process the files because it is seeing the \r\n as the end of the row despite being wrapped in quotes.

Is there an easy way to fix these files other than opening each file up individually in something like notepad++? It seems there should be a way to ignore line breaks if they are contained within quotes.

Example is something like this:
1,200,400,"123 street","123 street,\r\nNew York, NY\r\nUnited States",\N,\N,200\r\n

Notepad++ works fine for finding and replacing values manually, but I'm trying to find a batch way to do this because I have multiple files (50+ per source table) and hundreds of thousands of records in each that I need to fix.

Upvotes: 1

Views: 1326

Answers (1)

rickvdbosch
rickvdbosch

Reputation: 15621

According to U-SQL GitHub issue 84: USQL and embedded newline characters you can either build a custom extractor, or try to use the escapeCharacter parameter of the built-in extractor:

USING Extractors.Csv(quoting : true, escapeCharacter : '\\') // quoting is true by default, but it does not hurt to repeat.

Upvotes: 1

Related Questions