Reputation: 43
I'm trying to find the best way to move a few specific line of text to the end of the one above it using Powershell. It is grabbing the contents of a CSV and looking for mistakes where somebody has hit their return key in the middle of typing.
Here is what the content looks like with two slightly different issues. All lines should be five columns long. You can see that two of the lines have been split in the middle. One has a double quote at the end while the other does not.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS" <--Line should be moved to the end of the line above.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS" <--Line should be moved to the end of the line above AND it needs to throw out one of the double quotes.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
I've posted the code that I'm using to tidy up the CSVs below. The first line ensures that any single quotes are switched to double quotes and that there is no white space on the end of the lines by trimming. We get many strangely formatted CSVs with a mix of single and double quotes as well as massive amounts of white space at the end of some lines. The second line is supposed to find the following patterns (NEWLINE)"," and "(NEWLINE)"," and replace each with "," so that it trails behind the line above it correctly.
(Get-Content $File).trim() -replace("','",'","') -replace("^'|'$", '"') | Set-Content $File
(Get-Content $File -Raw) -replace("`"[`r`n]`",`"", '","') -replace("[`r`n]`",`"", '","') | Set-Content $File
The first line of code works nicely on its own.
The second -replace on the second line of code seems to work as long as I do not run the first line of code before it. That's a problem as I need to make sure everything is trimmed and using double quotes before running the second line of code.
I have not been able to get the first -replace of the second line of code to work at all yet. The only way I have gotten anything to work is by escaping out of every double quote mark and placing the newline code in square brackets. Is there some way to get all of this to work together correctly? Thanks in advance for any help you can provide.
Upvotes: 1
Views: 1001
Reputation: 439247
Your -replace
operations were flawed; try the following instead:
$fileContent = @'
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
'@
$fileContent -replace '(?:"|(.))\r?\n","', '$1","'
The result is:
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
Regex operand, '(?:"|(.))\r?\n","'
:
(?:"|([^"]))
is a non-capturing group ((?:...)
) that matches a single "
or (|
) any other (non-newline) character (.
), enclosed in a nested capturing (capture) group ((...)
).
\r?\n
matches either a CRLF (\r\n
) or a LF-only newline (\n
), by optionally (?
) matching an \r
.
\r\n
; if you know that only LF newlines are present, you can use \n
.","
matches that string as-is (verbatim).
Replacement operand, '$1","'
:
$1
refers to what the first (and only) capture group matched - either nothing, if the line ended in "
, or the last character on the line otherwise; by following that with verbatim ","
, the newline is effectively removed.As for what you tried:
Assuming that your file has CRLF newlines, the problem with your -replace
operation was subexpression "[`r`n]"
: it only matches a single character from the set of characters inside [...]
, i.e, either a CR ("`r"
) or a LF ("`n"
).
Note that the solution above uses regex escape sequences \r
and \n
for CR an LF, respectively, which allows use of a single-quoted string ('...'
) as the regex operand, which prevents confusion between what PowerShell's string interpolation interprets up front vs. what the regex engine ends up seeing.
Upvotes: 1
Reputation: 856
in the case of your example, this solution works.
$lines = @'
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS" <--Line should be moved to the end of the line above.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS" <--Line should be moved to the end of the line above AND it needs to throw out one of the double quotes.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
'@ -split "`r`n"
for ($i = 0; $i -lt $lines.Count; $i++){
if (($lines[$i] -split '","').Count -ne 5){
if ($lines[$i].StartsWith('",')){
$lines[$i-1].TrimEnd('"') + '"' + $lines[$i].TrimStart('"')
}
}
else{
$lines[$i]
}
}
Upvotes: 1