Reputation: 35
I have a csv file which looks like this. Please note, that not all data entries are multiline!
225253;abc;def;ghi;"- sometext
- sometext
- 3sometext
";asd,asd;58.2500;False;False;False;17;0.0000;;
My goal is to use a bash script in order to convert it into this form:
225253;abc;def;ghi;"- sometext - sometext - 3sometext";asd,asd;58.2500;False;False;False;17;0.0000;;
My first guess was this. But somehow it won't work...
sed -e 's/\"\([^"]+\)\"//g'
Upvotes: 2
Views: 1109
Reputation: 341
Can also try this sed script. It matches lines which have open double quote "
without field separator ;
before end of line. Assuming that you need to remove newlines only if they occur within double quoted fields. It also assumes that whole field is quoted (at least ends like "stuff";
). If that's not the case then slight adjustment shall be made.
:again
/"[^;]*$/ {
N
s/\n//
b again
}
s/"[^;]*";/;/g
put it into script.sed
and run like sed -r -f script.sed file
If quoted fields shall be retained then just delete last line from the script.
Upvotes: 0
Reputation: 40748
awk '{gsub(/\n/,"")}1' FS=";" RS=";" ORS=";" file
gives
225253;abc;def;ghi;"- sometext - sometext- sometext";asd,asd;58.2500;False;False;False;17;0.0000;;
Update
Or using patsplit
in Gnu Awk version 4:
BEGIN { FS=RS=";"}
{
if (patsplit($0,a,/"[^"]+"/,s)) {
gsub(/\n/,"",a[1])
printf "%s%s%s", s[0],a[1],s[1]
}
else
printf "%s", $0
printf ";"
}
This will only remove newlines inside double quotes..
Upvotes: 0
Reputation: 124646
The clean way to do this is using Text::CSV, as @JonathanLeffler suggested in comments, or something equivalent to that. That is, using a library dedicated to processing CSV files. See my Perl implementation at the bottom.
However, Text::CSV
is usually not installed by default, so you might have to install it yourself. If that's not an option or too difficult for you, then a less perfect but simpler awk
solution might be good enough, based on a similar question:
awk -F ";" -v nf=13 'NF < nf { line = line (line ? OFS : "") $0; fields += NF } fields >= nf { print line; line=""; fields=0 } NF == nf'
For reference, the Perl solution using Text::CSV
:
use Text::CSV;
my $sep = ';';
my $csv = Text::CSV->new({ binary => 1, sep_char => $sep });
while (my $row = $csv->getline(*STDIN)) {
print join($sep, map { s/\n$//; s/ *\n/ /g; $_ } @$row), "\n";
}
Save this in a file transform-csv.pl
and run it with:
perl transform-csv.pl < sample.csv
Upvotes: 1
Reputation: 26121
perl -MText::CSV_XS -e'my $csv = Text::CSV_XS->new({binary=>1,sep_char=>";"});while(my $row = $csv->getline(ARGV)){$csv->print(STDOUT,[map s/\n/ /g,@$row])}'
or you can use Text::CSV
as well. Add parameters to parser constructor for tuning other behavior.
Upvotes: 0
Reputation: 58391
This might work for you (GNU sed):
sed -r ':a;/^[^"]*("[^"]*")+(;[^"]+|$)/b;$!{N;s/\n//;ba}' file
This looks for lines with matching quotes. If it finds a line with non-matching quotes it appends the next line removes the newline and repeats until the quotes match or reaches the end of the file.
N.B. This does not cater for quotes within quotes.
Upvotes: 1