Reputation: 35

How can I bring multiline data in quotes back into one line?

I have a csv file which looks like this. Please note, that not all data entries are multiline!

225253;abc;def;ghi;"- sometext 
- sometext
- 3sometext
";asd,asd;58.2500;False;False;False;17;0.0000;;

My goal is to use a bash script in order to convert it into this form:

225253;abc;def;ghi;"- sometext - sometext - 3sometext";asd,asd;58.2500;False;False;False;17;0.0000;;

My first guess was this. But somehow it won't work...

sed -e 's/\"\([^"]+\)\"//g'

Upvotes: 2

Answers (5)

Mindaugas Kubilius

Reputation: 341

Can also try this sed script. It matches lines which have open double quote " without field separator ; before end of line. Assuming that you need to remove newlines only if they occur within double quoted fields. It also assumes that whole field is quoted (at least ends like "stuff";). If that's not the case then slight adjustment shall be made.

:again
/"[^;]*$/ {
  N
  s/\n//
  b again
}
s/"[^;]*";/;/g

put it into script.sed and run like sed -r -f script.sed file

If quoted fields shall be retained then just delete last line from the script.

Upvotes: 0

Håkon Hægland

Reputation: 40748

awk '{gsub(/\n/,"")}1' FS=";" RS=";" ORS=";" file

gives

225253;abc;def;ghi;"- sometext - sometext- sometext";asd,asd;58.2500;False;False;False;17;0.0000;;

Update

Or using patsplit in Gnu Awk version 4:

BEGIN { FS=RS=";"}
{
    if (patsplit($0,a,/"[^"]+"/,s)) {
        gsub(/\n/,"",a[1])
        printf "%s%s%s", s[0],a[1],s[1]
    }
    else
        printf "%s", $0
    printf ";"
}

This will only remove newlines inside double quotes..

Upvotes: 0

janos

Reputation: 124646

The clean way to do this is using Text::CSV, as @JonathanLeffler suggested in comments, or something equivalent to that. That is, using a library dedicated to processing CSV files. See my Perl implementation at the bottom.

However, Text::CSV is usually not installed by default, so you might have to install it yourself. If that's not an option or too difficult for you, then a less perfect but simpler awk solution might be good enough, based on a similar question:

awk -F ";" -v nf=13 'NF < nf { line = line (line ? OFS : "") $0; fields += NF } fields >= nf { print line; line=""; fields=0 } NF == nf'

For reference, the Perl solution using Text::CSV:

use Text::CSV;

my $sep = ';';
my $csv = Text::CSV->new({ binary => 1, sep_char => $sep });

while (my $row = $csv->getline(*STDIN)) {
    print join($sep, map { s/\n$//; s/ *\n/ /g; $_ } @$row), "\n";
}

Save this in a file transform-csv.pl and run it with:

perl transform-csv.pl < sample.csv

Upvotes: 1

Hynek -Pichi- Vychodil

Reputation: 26121

perl -MText::CSV_XS -e'my $csv = Text::CSV_XS->new({binary=>1,sep_char=>";"});while(my $row = $csv->getline(ARGV)){$csv->print(STDOUT,[map s/\n/ /g,@$row])}'

or you can use Text::CSV as well. Add parameters to parser constructor for tuning other behavior.

Upvotes: 0

potong

Reputation: 58391

This might work for you (GNU sed):

sed -r ':a;/^[^"]*("[^"]*")+(;[^"]+|$)/b;$!{N;s/\n//;ba}' file

This looks for lines with matching quotes. If it finds a line with non-matching quotes it appends the next line removes the newline and repeats until the quotes match or reaches the end of the file.

N.B. This does not cater for quotes within quotes.

Upvotes: 1

How can I bring multiline data in quotes back into one line?

Answers (5)

Related Questions