Reputation: 31
I have to create an awk
script to achieve the following transformation:
FLAG
column must be splitt in FLAG1
and FLAG2
FLAG1
and FLAG2
are filled given the following conditions:
if the VAL is ":" then NUM is null
if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"
The csv
input file is:
"PRIM", "TRD", "GTR", "VAL", "FLAG"
"TPP", "T5-78", "HT", ":", c
"TCP", "T5-78", "HT", "12,385", c
"TZP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", cd
"TNP", "T5-78", "HT", ":", du
"TNP", "T5-78", "HT", "12,524,652", dfg
The output .dat
file should look like this:
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT 12385 c null
TZP T5-78 HT 0 null z
TNP T5-78 HT 0 null z
TNP T5-78 HT null c null
TNP T5-78 HT null null u
TNP T5-78 HT 12524652 null dfg
The code I have tried doesn't work properly as only the first 3 requirements are met while the 4th just doesn't work.
BEGIN {
FS=","; OFS="\t";
a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
}
NR==1 {
{ $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
$a["VAL"]=="12,385" && $a["FLAG"] == "d" { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
$a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
$a["VAL"] != ":" { print ; next }
$a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
$a["FLAG"] != "z" { $a["VAL"] = "" }
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
}
{ print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);
Here is the latest code which I think it will work. the problem now which I cannot solve is that the last value (FLAG2) is printed on the second line. I tried to put OFS
but it doesn't fix the problem. Could you please tell me what is wrong in this case.
BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}
NR==1 {
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;
a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"]; # just for testing and it is ok
a["FLAG2"] = a["FLAG"]; # just for testing and it is ok
}
{
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}
the output is something like
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c
null
TCP T5-78 HT 12385 c
null
TZP T5-78 HT 0 null
z
After so many advices this is my last version but it is still unsuccesful... Now when I'm adding if statements for fulfilling the requirements above, nothing happens. I believe the if statements either are not correct nor putted in the right place. printing the values if NR>1 is a catastropha.. could you pls tell me what is wrong with my script? I have to admit that I started with this awk 3 days ago and so far it is painful...the problem is that I have should have finished this script since last week
BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}
NR==1 {
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;
#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];
a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}
#MY IF STATEMENTS GO HERE - TEST MODE
a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}
{
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}
#print content
NR>1
{
for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;
#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];
a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE - TEST MODE
a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}
{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}
Upvotes: 3
Views: 223
Reputation: 31
the final working solution is exactly like this one:
BEGIN {
FS=",";
OFS="\t";
}
{
# delete the carriage return character from windows. the magic part
sub(/\015$/,"")
sub(/^"/, "", $1)
sub(/"$/, "", $NF)
}
NR==1{
#$NF=substr($NF,1,length($NF)-1);
for (i=1;i<=NF;i++) col[$i]=i
print "PRIM TRD GTR NUM FLAG1 FLAG2"
next
}
{
f=$col["FLAG"];
v=$col["NUM"];
gsub(/,/, "", v)
gsub(/,/, "", f)
flag1 = "";
flag2 = "";
if(substr(v,1,1) == ":" && substr(f,1,1) == "c")
{
flag1 = "c";
flag2 ="";
}
if ( ! ( substr(f,1,1) == "dz" || substr(f,1,1) =="du" || substr(f,1,1) =="cd" || substr(f,1,1) =="c" || substr(f,1,1) =="z") || substr(f,1,1) =="u" || substr(f,1,1) =="d" ) )
{
flag1 = "";
flag2 =f;
}
print $col["PRIM"],$col["TRD"],$col["GTR"],v,flag1,flag2 ;
}
Upvotes: 0
Reputation: 246744
This requires all input fields to have double quotes.
$ echo '"PRIM", "TRD", "GTR", "VAL", "FLAG"
"TPP", "T5-78", "HT", ":", "c"
"TCP", "T5-78", "HT", "12,385", "c"
"TZP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "cd"
"TNP", "T5-78", "HT", ":", "du"
"TNP", "T5-78", "HT", "12,524,652", "dfg"' |
awk -F '",[ \t]*"' '
{ sub(/^"/, "", $1); sub(/"$/, "", $NF)}
NR == 1 {
for (i=1; i<=NF; i++) col[$i] = i
print "PRIM TRD GTR NUM FLAG1 FLAG2"
next
}
{
f = $col["FLAG"]
v = $col["VAL"]; gsub(/,/, "", v)
num = "null"; flag1 = "null"; flag2 = "null"
}
v == ":" && f == "c" {flag1 = "c"}
v == ":" && f == "u" {flag2 = "u"}
v == "14385" && f == "d" {num = $4}
v == "14385" && f == "du" {num = $4; flag2 = "u"}
v == ":" && f == "cd" {flag1 = "c"}
v == ":" && f == "bc" {flag1 = "c"; flag2 = "b"}
v == ":" && f == "z" {num = 0; flag2 = "z"}
{print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null
My output does not look like yours. Review your specifications and make sure the sample input is sufficient to cover them.
Upvotes: 2