Reputation: 11
I have a CSV file with text columns quoted in single quote around it and other non text columns don't have a quote around it.The text columns might have a comma or single quote within these single quoted text columns. I found a script online but it doesn't handle this kind of situation.
Is there way to handle this in PowerShell?
Example:
123,678.89,'hello there1', '[email protected]', 'abc,nds'\n
123,678.89,'hello 'there2', '[email protected]', 'akiu-'nds'\n
Output:
123,678.89|hello there1|[email protected]|abc,nds \n
123,678.89|hello 'there2|[email protected]|akiu-'nds \n
Example 2:
123,6272,678.89,,,'hello ,there1',,,,'abc1','tw,es',,'[email protected]',,,,,,'abc,nds1'\n
124,8272,928.89,,,,'hello 'there2',,,'abc2','twes',,,'[email protected]',,'biej',,,'abc'nds2'\n
125,9272,328.89,,'hello 'there3',,'abc3',', outyi',,,,'[email protected]',,,,,,'ahct','abc'nds3'\n
Output:
123|6272|678.89|||hello ,there1||||abc1|tw,es||[email protected]||||||abc,nds1\n 124|8272|928.89||||hello 'there2|||abc2|twes|||[email protected]||biej|||abc'nds2\n 125|9272|328.89||hello 'there3||abc3|, outyi||||[email protected]||||||ahct|abc'nds3\n
Upvotes: 1
Views: 1895
Reputation: 46700
Similar to Kiran's answer. There are a couple of things that need to change so I don't think that there is a one size fits all solution. We need to chain these couple of changes. First being the commas that are actually delimiters and second the special end of line character sequence.
$path = "c:\temp\file.csv"
$newDelimiter = "|"
(Get-Content $path) -replace "'\s*?,\s?'|,\s?'|'\s?,",$newDelimiter -replace "'\s*?\\n$","\n" | Set-Content $path
I have a regex101 link that explains with more detail. The regex doing the greater work is the first with three potential alternate matches. This effectively ignores quotes that are off by themselves. If there is data that has a quote and comma combo then I think it would be following to program this without more information.
'\s*?,\s?'
: Comma enclosed in quotes optionally surrounded by variant white-space. ,\s?'
: Comma with optional space followed by a quote'\s?,
: Quote with optional space followed by a commaSo a match of any of the above groups would be replaced with $newDelimiter
. Second regex is just looking for '\n$
while accounting for potential optional white-space between the quote and \n
that happens at the end of the line. This is how the last single quote is removed.
Upvotes: 1
Reputation: 2904
something like this?
Get-Content C:\temp\file.txt |
ForEach-Object {$_ -replace ",'|',\s+'",'|' -replace "'\\n",' \n'} |
Set-Content C:\temp\newfile.txt
Note: the code is just one line, broken into 3 separate lines to read better.
Upvotes: 0