Reputation: 45
I am having a bit of a conundrum working with some CSV files that need to be cleansed and loaded into a database.
I am fairly adept with PowerShell, but poor with regular expressions, and csv column manipulation.
Here is the issue I am having; there is a 'notes' field in the CSV file I am working with, that can have all sorts of various characters. The main problem is that I need to remove the line feeds, and quotes WITHIN the field, but leave the regular line feeds and text qualifying quotes where they should be. I can remove the line feeds and quotes throughout the file, but not specifically down to the characters within the field.
I have tried working with regular expressions to do this, but am not having much luck, and honestly, I am not that adept with regular expressions. I am hoping someone here will be able to help with this!
Edit: here is the example data
"123" "" "2017-02-13 10:26:08" "123456789" "2017-02-10" "No" "Yes" "Yes" "No" "sample text
<crlf> ""additional text""
<crlf>
<crlf> " "Y" <crlf>
this should simply be one line with no except at the end.
Upvotes: 2
Views: 136
Reputation: 73526
The built-in Import-Csv
cmdlet correctly imports multiline and quoted values.
Your file is tab-delimited so we'll specify "`t"
:
Import-Csv c:\file.csv -Delimiter "`t" | ForEach {
$_.notes = $_.notes -replace '"', '' -replace '[\r\n]+', ' '
$_
} | Export-Csv c:\output.csv -Delimiter "`t" -NoTypeInformation -Encoding UTF8
Upvotes: 1