user3569977
user3569977

Reputation: 31

how to split one column into another by multiple conditions?

I have to create an awk script to achieve the following transformation:

  1. Column order is random
  2. There is no fixed structure. actually this is a big issue
  3. The FLAG column must be splitt in FLAG1 and FLAG2
  4. FLAG1 and FLAG2 are filled given the following conditions:

    if the VAL is ":" then NUM is null
    if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
    if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
    if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
    if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
    if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"
    

The csv input file is:

"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   c
"TCP",   "T5-78", "HT",    "12,385",  c
"TZP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   cd
"TNP",   "T5-78", "HT",    ":",   du
"TNP",   "T5-78", "HT",    "12,524,652",  dfg

The output .dat file should look like this:

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c   null
TCP T5-78   HT  12385   c   null
TZP T5-78   HT  0   null    z
TNP T5-78   HT  0   null    z
TNP T5-78   HT  null    c   null
TNP T5-78   HT  null    null    u
TNP T5-78   HT  12524652    null    dfg

The code I have tried doesn't work properly as only the first 3 requirements are met while the 4th just doesn't work.

BEGIN {
      FS=","; OFS="\t";
      a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
    }
    NR==1 {

   { $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
    $a["VAL"]=="12,385" && $a["FLAG"] == "d"  { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
    $a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
    $a["VAL"] != ":" { print ; next }
    $a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
     $a["FLAG"] != "z" { $a["VAL"] = "" }

        $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
    }
    {   print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
        NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);

Here is the latest code which I think it will work. the problem now which I cannot solve is that the last value (FLAG2) is printed on the second line. I tried to put OFS but it doesn't fix the problem. Could you please tell me what is wrong in this case.

BEGIN {
FS=","; 
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {
    $NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++) 
#if($i in a) 
a[$i]=i;

a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"];  # just for testing and it is ok
a["FLAG2"] = a["FLAG"];  # just for testing and it is ok

}

{   

print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
    NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];

}

the output is something like

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c
   null
TCP T5-78   HT  12385   c
   null
TZP T5-78   HT  0   null
    z

After so many advices this is my last version but it is still unsuccesful... Now when I'm adding if statements for fulfilling the requirements above, nothing happens. I believe the if statements either are not correct nor putted in the right place. printing the values if NR>1 is a catastropha.. could you pls tell me what is wrong with my script? I have to admit that I started with this awk 3 days ago and so far it is painful...the problem is that I have should have finished this script since last week

BEGIN {
FS=",";
OFS="\t";

a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {

$NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}

{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}

#MY IF STATEMENTS GO HERE   - TEST MODE   

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{  
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}

#print content
NR>1
{
    for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE   - TEST MODE   

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}

Upvotes: 3

Views: 223

Answers (2)

user3569977
user3569977

Reputation: 31

the final working solution is exactly like this one:

BEGIN {
FS=",";
OFS="\t";
}
{
# delete the carriage return character from windows. the magic part
sub(/\015$/,"")
sub(/^"/, "", $1)
 sub(/"$/, "", $NF)
}

NR==1{

        #$NF=substr($NF,1,length($NF)-1);
        for (i=1;i<=NF;i++) col[$i]=i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
}

{

        f=$col["FLAG"];
        v=$col["NUM"];
        gsub(/,/, "", v)
        gsub(/,/, "", f)
       flag1 = "";
       flag2 = "";

   if(substr(v,1,1) == ":" && substr(f,1,1) == "c")
    {
      flag1 = "c";
      flag2 ="";
    }

if ( ! ( substr(f,1,1) == "dz" || substr(f,1,1) =="du" || substr(f,1,1) =="cd" || substr(f,1,1) =="c" || substr(f,1,1) =="z") || substr(f,1,1) =="u" || substr(f,1,1) =="d" ) )
    {
      flag1 = "";
      flag2 =f;

    }

     print $col["PRIM"],$col["TRD"],$col["GTR"],v,flag1,flag2 ;


}

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246744

This requires all input fields to have double quotes.

$ echo '"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   "c"
"TCP",   "T5-78", "HT",    "12,385",  "c"
"TZP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "cd"
"TNP",   "T5-78", "HT",    ":",   "du"
"TNP",   "T5-78", "HT",    "12,524,652",  "dfg"' | 
awk -F '",[ \t]*"' '
    { sub(/^"/, "", $1); sub(/"$/, "", $NF)}
    NR == 1 {
        for (i=1; i<=NF; i++) col[$i] = i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
    } 
    {
        f = $col["FLAG"] 
        v = $col["VAL"]; gsub(/,/, "", v) 
        num = "null"; flag1 = "null"; flag2 = "null"
    }
    v == ":"      &&  f == "c"   {flag1 = "c"}
    v == ":"      &&  f == "u"   {flag2 = "u"} 
    v == "14385"  &&  f == "d"   {num = $4}
    v == "14385"  &&  f == "du"  {num = $4; flag2 = "u"}
    v == ":"      &&  f == "cd"  {flag1 = "c"}
    v == ":"      &&  f == "bc"  {flag1 = "c"; flag2 = "b"}
    v == ":"      &&  f == "z"   {num = 0; flag2 = "z"}
    {print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null

My output does not look like yours. Review your specifications and make sure the sample input is sufficient to cover them.

Upvotes: 2

Related Questions