shannonjk
shannonjk

Reputation: 45

A few conundrums with replacing quotes and special characters in CSV files

I am having a bit of a conundrum working with some CSV files that need to be cleansed and loaded into a database.

I am fairly adept with PowerShell, but poor with regular expressions, and csv column manipulation.

Here is the issue I am having; there is a 'notes' field in the CSV file I am working with, that can have all sorts of various characters. The main problem is that I need to remove the line feeds, and quotes WITHIN the field, but leave the regular line feeds and text qualifying quotes where they should be. I can remove the line feeds and quotes throughout the file, but not specifically down to the characters within the field.

I have tried working with regular expressions to do this, but am not having much luck, and honestly, I am not that adept with regular expressions. I am hoping someone here will be able to help with this!

Edit: here is the example data

"123"   ""  "2017-02-13 10:26:08" "123456789"   "2017-02-10"    "No"    "Yes"   "Yes"   "No"    "sa‌​mple text 
<crlf> ""additional text""
<crlf> 
<crlf> "    "Y" <crlf>

this should simply be one line with no except at the end.

Upvotes: 2

Views: 136

Answers (1)

woxxom
woxxom

Reputation: 73526

The built-in Import-Csv cmdlet correctly imports multiline and quoted values.

Your file is tab-delimited so we'll specify "`t":

Import-Csv c:\file.csv -Delimiter "`t" | ForEach {
    $_.notes = $_.notes -replace '"', '' -replace '[\r\n]+', ' '
    $_
} | Export-Csv c:\output.csv -Delimiter "`t" -NoTypeInformation -Encoding UTF8

Upvotes: 1

Related Questions