John Long
John Long

Reputation: 11

Handle a CSV file with single quoted string and occasional comma or single quote within the single quoted string

I have a CSV file with text columns quoted in single quote around it and other non text columns don't have a quote around it.The text columns might have a comma or single quote within these single quoted text columns. I found a script online but it doesn't handle this kind of situation.

Is there way to handle this in PowerShell?

Example:

123,678.89,'hello there1', '[email protected]', 'abc,nds'\n 
123,678.89,'hello 'there2', '[email protected]', 'akiu-'nds'\n

Output:

123,678.89|hello there1|[email protected]|abc,nds \n 
123,678.89|hello 'there2|[email protected]|akiu-'nds \n

Example 2:
123,6272,678.89,,,'hello ,there1',,,,'abc1','tw,es',,'[email protected]',,,,,,'abc,nds1'\n 124,8272,928.89,,,,'hello 'there2',,,'abc2','twes',,,'[email protected]',,'biej',,,'abc'nds2'\n 125,9272,328.89,,'hello 'there3',,'abc3',', outyi',,,,'[email protected]',,,,,,'ahct','abc'nds3'\n

Output:

123|6272|678.89|||hello ,there1||||abc1|tw,es||[email protected]||||||abc,nds1\n 124|8272|928.89||||hello 'there2|||abc2|twes|||[email protected]||biej|||abc'nds2\n 125|9272|328.89||hello 'there3||abc3|, outyi||||[email protected]||||||ahct|abc'nds3\n

Upvotes: 1

Views: 1895

Answers (2)

Matt
Matt

Reputation: 46700

Similar to Kiran's answer. There are a couple of things that need to change so I don't think that there is a one size fits all solution. We need to chain these couple of changes. First being the commas that are actually delimiters and second the special end of line character sequence.

$path = "c:\temp\file.csv"
$newDelimiter = "|"
(Get-Content $path) -replace "'\s*?,\s?'|,\s?'|'\s?,",$newDelimiter -replace "'\s*?\\n$","\n" | Set-Content $path

I have a regex101 link that explains with more detail. The regex doing the greater work is the first with three potential alternate matches. This effectively ignores quotes that are off by themselves. If there is data that has a quote and comma combo then I think it would be following to program this without more information.

  • '\s*?,\s?': Comma enclosed in quotes optionally surrounded by variant white-space.
  • ,\s?': Comma with optional space followed by a quote
  • '\s?,: Quote with optional space followed by a comma

So a match of any of the above groups would be replaced with $newDelimiter. Second regex is just looking for '\n$ while accounting for potential optional white-space between the quote and \n that happens at the end of the line. This is how the last single quote is removed.

Upvotes: 1

Kiran Reddy
Kiran Reddy

Reputation: 2904

something like this?

Get-Content C:\temp\file.txt | 
   ForEach-Object {$_ -replace ",'|',\s+'",'|' -replace "'\\n",' \n'} | 
       Set-Content C:\temp\newfile.txt 

Note: the code is just one line, broken into 3 separate lines to read better.

Upvotes: 0

Related Questions