Reputation: 33
I need to manage smtp logfile handling in my company.
These logfiles need to be imported to MSSQL, so it is my job to provide this data.
I got strange undelivery message with a ";" in the string, I need to replace this with a comma.
So what I got:
Sender;Recipient;Operation;Answer;Error;Servername
[email protected];[email protected];RCPT TO;450;+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions;+try+later;M0641
Mention the ";" in the Answer field after "restrictions", dunno why the mail server sends semicolons, maybe to annoy me :P
I tried following with awk after I did a lot of research:
awk 'BEGIN{FS=OFS=";"} {for (i=5;i<=NF;i++) gsub (";",",",$i)} 1' myfile.csv
This command actually works but it seems it does nothing with my file, the ";" in the error field remains. What I am missing here ?
Upvotes: 3
Views: 3274
Reputation: 67467
I think your problem is replacing the unquotes delimiters in your logical 4th field in a five field wide input. Although this script is repetitious should be easier to understand
$ awk '{n=split($0,a,";");
for(i=1; i<4; i++) printf "%s;", a[i];
for(i=4; i<n-1; i++) printf "%s,", a[i];
printf "%s;%s\n", a[n-1], a[n]}' file
A better way to write the same based on @Ed Morton's comments
$ awk -F';' '{for(i=1; i<NF-1; i++) printf "%s"(i<4?FS:","), $i;
print $(NF-1) FS $NF}' file
For the input
1;2;3;4a;4b;4c;5
1;2;3;4;5
it generates
1;2;3;4a,4b,4c;5
1;2;3;4;5
Upvotes: 1
Reputation: 37404
If your fifth ;
should be removed, append $6 to $5 and advance accordingly. This could be done with for
loop (there are examples in SO) but since the fault is so near the end, we'll just do this in a simpler way:
$ awk 'BEGIN {FS=OFS=";"} NR==1 {nf=NF} NF==(nf+1) {$5=$5 "," $6; $6=$7; NF=nf} 1' file
Explained:
BEGIN {FS=OFS=";"} # set separator
NR==1 {nf=NF} # get field count from the first record (6)
NF==(nf+1) { # if record is one field longer:
$5=$5 "," $6 # append $6 to $5, comma-separated
$6=$7 # set $7 (NF) to $6 (nf)
NF=nf # reset NF
} 1 # output
Testing: Running the program and sending the output to cut -d\; -f 5
outputs:
Error
+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later
Upvotes: 0
Reputation: 113834
;
with ,
$ awk -F\; '{for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))}' myfile.csv
Sender;Recipient;Operation;Answer;Error,Servername
[email protected];[email protected];RCPT TO;450;+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator for input to ;
.
for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))
This loops over every field and prints the field followed by (a) ORS if we are on the last field, or (b) ,
if were are on field 5 or later, or (c) ;
if we are on one of the first four fields.
;
with ,
Try:
$ awk -F\; '{$1=$1} 1' OFS=, myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
[email protected],[email protected],RCPT TO,450,+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator on input to a semicolon.
$1=$1
This causes awk to think the the line has been changed so that awk will update the output line to use the new field separator.
1
This tells awk to print the line.
OFS=,
This sets the field separator on output to a comma.
$ awk '{gsub(/;/, ",")} 1' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
[email protected],[email protected],RCPT TO,450,+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
$ sed 's/;/,/g' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
[email protected],[email protected],RCPT TO,450,+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
Upvotes: 5
Reputation: 203324
If the offending semi-colons only appear in your 5th field then you can do this using GNU awk for the 3rd arg to match():
$ awk 'match($0,/(([^;]+;){4})(.*)(;[^;]+$)/,a){gsub(/;/,",",a[3]); print a[1] a[3] a[4]}' file
[email protected];[email protected];RCPT TO;450;+4.2.0+<[email protected]>:+Recipient+address+rejected:+Policy+restrictions,+try+later;M0641
Upvotes: 0