Tim Dunkley
Tim Dunkley

Reputation: 75

sed statement to change/modify CSV separators and delimiters

I have some CSV files which contains comma seperated values and some of the column values can contain characters like ,.<>!/\;&

I am trying to convert the CSV to be a comma separated, quote enclosed CSV

Example Data:

DateCreated,DateModified,SKU,Name,Category,Description,Url,OriginalUrl,Image,Image50,Image100,Image120,Image200,Image300,Image400,Price,Brand,ModelNumber
2012-10-19 10:52:50,2013-06-11 02:07:16,34,Austral Foldaway 45 Rotary Clothesline,Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers,"Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/Products/7228/1718695.jpg,http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],309.9000 AUD,Austral,FA45GR

And the output I'm trying to achieve is

"DateCreated","DateModified","SKU","Name","Category","Description","Url","OriginalUrl","Image","Image50","Image100","Image120","Image200","Image300","Image400","Price","Brand","ModelNumber"
"2012-10-19 10:52:50","2013-06-11 02:07:16","34","Austral Foldaway 45 Rotary Clothesline","Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers","Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/Products/7228/1718695.jpg","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","309.9000 AUD","Austral","FA45GR"

Any assistance is GREATLY appreciated.

Upvotes: 3

Views: 2095

Answers (3)

rabensky
rabensky

Reputation: 2934

First, lets try the trivial (and "not good enough") solution that just adds a double quote to each field (including those that already have double quotes! Which isn't what you want)

sed -r 's/([^,]*)/"\1"/g'

Great, the first part looks for sequences with no commas in them, the second part adds double quotes around them, the final 'g' means doing it more than once per line

This will turn

abc,345, some words ,"some text","text,with,commas"

into "abc","345"," some words ",""some text"",""text","with","commas""

A few things to note:

  • it correctly surrounds "some words" with space between them, BUT also surrounds the initial and final spaces. I assume that's OK but if not it can be fixed

  • If the field already had quotes, it will be quoted again, which is BAD. Needs to be fixed

  • if the field already had quotes AND the inner text had commas (which shouldn't be considered field separators) these commas are also quoted. This too needs to be fixed

So we want to match two different regexps - either there was a quoted string or a field with no commas:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g'

The result would now be

"abc","345"," some words ",""some text"",""text,with,commas""

As you can see, we have a double quote on the originally quoted text. This we will have to remove with a second sed command:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g'

Which results in

"abc","345"," some words ","some text","text,with,commas"

YAY!

Upvotes: 3

Birei
Birei

Reputation: 36262

Try this solution. It's superior to my previous , because now I use a parser that handles those commas inside fields correctly. It's neccesary the module Text::CSV_XS to work:

#!/usr/bin/env perl

use strict;
use warnings;
use Text::CSV_XS;

die qq|Usage: perl $0 <csv-file>\n| unless @ARGV == 1;

open my $fh, '<', shift or die qq|ERROR: Could not open input file\n|;

my $csv = Text::CSV_XS->new( {
        always_quote => 1,
} );

while ( my $row = $csv->getline( $fh ) ) { 
        $csv->print( *STDOUT, $row );
        print "\n";
}
$csv->eof;
close $fh;

Upvotes: 0

cforbish
cforbish

Reputation: 8819

It sounds like you want every line in a file to begin and end with a double quote. If so this should work:

sed -i.bak 's/^\(.*\)$/"\1"/' filename

Upvotes: 0

Related Questions