LowerMoon
LowerMoon

Reputation: 61

Replace empty spaces in a column with a character

My file looks like this:

  Scenario 1                                     0.20          0.00     0.00 r
  Scenario 2                                     0.08          0.34 &   0.34 r
  Scenario 3                          6   12.95 
  Scenario 4                              0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                                     0.07          0.08 &   0.42 r
  Scenario 6                          6   8.70 
  Scenario 7                              0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                                     0.31          0.28 &   0.70 f
  Scenario 9                          5   5.06 

My objectives is: To replace columns with empty cells/spaces/absent values with "-" (there are a total of 8 fields)

The problem I'm facing while using the awk command to do this is that the field separator keeps changing with every line.

What I've done so far: I've extracted the lines which have certain field patterns and placed them in different files. Eg: I have placed Scenario 3,6 and 9 in one file and the rest in another file to make it easier to work on the data. What I have now is:

File 1:

Scenario 3                          6   12.95
Scenario 6                          6   8.70
Scenario 9                          5   5.06

File 2:

  Scenario 1                                     0.20          0.00     0.00 r
  Scenario 2                                     0.08          0.34 &   0.34 r

  Scenario 4                              0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                                     0.07          0.08 &   0.42 r

  Scenario 7                              0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                                     0.31          0.28 &   0.70 f

Expected output:

  Scenario 1                          -     -    0.20    -     0.00     0.00 r
  Scenario 2                          -     -    0.08    -     0.34 &   0.34 r
  Scenario 3                          6   12.95   -      -      -        -
  Scenario 4                          -   0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                          -     -    0.07    -     0.08 &   0.42 r
  Scenario 6                          6   8.70    -      -      -        -
  Scenario 7                          -   0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                          -     -    0.31          0.28 &   0.70 f
  Scenario 9                          5   5.06    -      -      -        -

Case 1(using awk with FIELDWIDTHS):

  $ awk 'BEGIN { FIELDWIDTHS="37 3 7 7 7 9 9 "} {for(i=1;i<=NF;++i){printf $i"|"};print""}' main1.txt

| I_BLENDER_0/R_137/CLK (SDFFX2_HVT) |   |       |  0.20 |       |  0.00   |  0.00 r
| I_BLENDER_0/R_137/Q (SDFFX2_HVT)   |   |       |  0.08 |       |  0.34 & |  0.34 r
| I_BLENDER_0/n2757 (net)            | 6 |  12.95|
| I_BLENDER_0/U4847/A1 (AND2X1_LVT)  |   |  0.00 |  0.08 |  0.00 |  0.00 & |  0.35 r
| I_BLENDER_0/U4847/Y (AND2X1_LVT)   |   |       |  0.07 |       |  0.08 & |  0.42 r
| I_BLENDER_0/n2616 (net)            | 6 |  8.70 |
| I_BLENDER_0/U1/A4 (NAND4X0_HVT)    |   |  0.00 |  0.07 |  0.00 |  0.00 & |  0.42 r
| I_BLENDER_0/U1/Y (NAND4X0_HVT)     |   |       |  0.31 |       |  0.28 & |  0.70 f

Case 2(using sed command):

  $  sed "s/^\(.\{,36\}\)$/\1`echo -$_{1..30}|tr -d '-'`/;
      s/^\(.\{38\}\) /\1-/;
      s/^\(.\{43\}\) /\1-/;
      s/^\(.\{50\}\) /\1-/;
      s/^\(.\{57\}\) /\1-/;
      s/^\(.\{64\}\) /\1-/;
      s/^\(.\{73\}\) /\1-/;
      s/ *$//"



  I_BLENDER_0/R_137/CLK (SDFFX2_HVT)  -    -     0.20    -     0.00     0.00 r
  I_BLENDER_0/R_137/Q (SDFFX2_HVT)    -    -     0.08    -     0.34 &   0.34 r
  I_BLENDER_0/n2757 (net)             6   12.95
  I_BLENDER_0/U4847/A1 (AND2X1_LVT)   -   0.00   0.08   0.00   0.00 &   0.35 r
  I_BLENDER_0/U4847/Y (AND2X1_LVT)    -    -     0.07    -     0.08 &   0.42 r
  I_BLENDER_0/n2616 (net)             6   8.70

Upvotes: 1

Views: 508

Answers (3)

kvantour
kvantour

Reputation: 26481

To do this, you can make use of FIELDWIDTHS in Gnu awk:

Basically, we split your lines in constant width fields. The following shows that the lines are split correctly:

$ awk 'BEGIN{ FIELDWIDTHS="13 25 2 7 7 7 9 9"}
       {for(i=1;i<=NF;++i){printf $i"|"};print""}' file

  Scenario 1 |                        |   |       |  0.20 |       |  0.00   |  0.00 r|
  Scenario 2 |                        |   |       |  0.08 |       |  0.34 & |  0.34 r|
  Scenario 3 |                        | 6 |  12.95| ||||
  Scenario 4 |                        |   |  0.00 |  0.08 |  0.00 |  0.00 & |  0.35 r|
  Scenario 5 |                        |   |       |  0.07 |       |  0.08 & |  0.42 r|
  Scenario 6 |                        | 6 |  8.70 |||||
  Scenario 7 |                        |   |  0.00 |  0.07 |  0.00 |  0.00 & |  0.42 r|
  Scenario 8 |                        |   |       |  0.31 |       |  0.28 & |  0.70 f|
  Scenario 9 |                        | 5 |  5.06 |||||

So all we need to do is replace the empty fields with the dash if needed.

$ awk 'BEGIN{ FIELDWIDTHS="13 24 3 7 7 7 9 9"}
       {s=$1$2}
       {s=s ($3~/^[[:blank:]]*$/?" - ":$3)}
       {s=s ($4~/^[[:blank:]]*$/?"   -   ":$4)}
       {s=s ($5~/^[[:blank:]]*$/?"   -   ":$5)}
       {s=s ($6~/^[[:blank:]]*$/?"   -   ":$6)}
       {s=s ($7~/^[[:blank:]]*$/?"   -     ":$7)}
       {s=s ($8~/^[[:blank:]]*$/?"   -     ":$8)}
       {print s}' file

and this gives:

  Scenario 1                          -    -     0.20    -     0.00     0.00 r
  Scenario 2                          -    -     0.08    -     0.34 &   0.34 r
  Scenario 3                          6   12.95   -      -      -        -     
  Scenario 4                          -   0.00   0.08   0.00   0.00 &   0.35 r
  Scenario 5                          -    -     0.07    -     0.08 &   0.42 r
  Scenario 6                          6   8.70    -      -      -        -     
  Scenario 7                          -   0.00   0.07   0.00   0.00 &   0.42 r
  Scenario 8                          -    -     0.31    -     0.28 &   0.70 f
  Scenario 9                          5   5.06    -      -      -        -     

remarks:

  • it would be better to use the real formatting that was used to set up these files.
  • I always leave an extra space before the fields to account for possible minus-signs
  • It looks like the floats are written with format %-5.2f. This is why the number 12.95 is not aligned. (%6.2f would have been better)

note: if you play a bit around, you can actually do it shorter. But you sort of lose the feeling of what is going on.

awk 'BEGIN{ FIELDWIDTHS="13 23 5 7 7 7 9 9"} 
     {for(i=3;i<=NF;++i)$i=$i~/^[[:blank:]]*$/?"  -":$i}
     {printf "%-13s%-23s%-5s%-7s%-7s%-7s%-9s%-9s\n",$1,$2,$3,$4,$5,$6,$7,$8}' file

or even shorter

awk 'BEGIN{ FIELDWIDTHS="36 5 7 7 7 9 9"; split(FIELDWIDTHS,a)}
     {for(i=1;i<=NF;++i) printf "%-*s",a[i], ($i~/^ *$/?"  -":$i); print ""}'

Upvotes: 1

oliv
oliv

Reputation: 13249

Using GNU awk and FIELDWIDTHS variable to split fields based on their length:

awk 'BEGIN{
      FIELDWIDTHS="38 4 7 7 7 9 6"
      colnr=split(FIELDWIDTHS,a," ")
    } 
    {
      for(i=1;i<=colnr;i++){
        $i=sprintf("%-"a[i]"s",((!$i&&$i!=0)||$i~/^ *$/?"-":$i))
      }
    }1' file
  Scenario 1                           -    -       0.20    -       0.00      0.00 r
  Scenario 2                           -    -       0.08    -       0.34 &    0.34 r
  Scenario 3                           6    12.95   -       -       -         -
  Scenario 4                           -    0.00    0.08    0.00    0.00 &    0.35 r
  Scenario 5                           -    -       0.07    -       0.08 &    0.42 r
  Scenario 6                           6    8.70    -       -       -         -
  Scenario 7                           -    0.00    0.07    0.00    0.00 &    0.42 r
  Scenario 8                           -    -       0.31    -       0.28 &    0.70 f
  Scenario 9                           5    5.06    -       -       -         -

The BEGIN block sets the array a with the length of all fields, and store the number of fields in the variable colnr.

The default block loops through all fields and rewrites them with the sprintf() function.
If the field contains only blanks $i~/^ *$/ or doesn't exist !$i&&$i!=0, replace it with a -. If not, the field remains untouched.

Upvotes: 1

Oleg
Oleg

Reputation: 108

Unfortunately, in this case you need to carefully count the character columns. Here is the code for the input that you provided -- you may need to adjust the numbers for your real input file.

sed "s/^\(.\{,78\}\)$/\1`echo -$_{1..78}|tr -d '-'`/;
  s/^\(.\{38\}\) /\1-/;
  s/^\(.\{43\}\) /\1-/;
  s/^\(.\{50\}\) /\1-/;
  s/^\(.\{57\}\) /\1-/;
  s/^\(.\{64\}\) /\1-/;
  s/^\(.\{73\}\) /\1-/;
  s/ *$//" input_file

Here, the first line adds spaces in the end of line in case the line terminates before reaching 78 characters -- this is then exploited by the substitution. In the end of chain substitutions, any trailing space is removed.

The messy-looking expression echo -$_{1..78}|tr -d '-' in the first line simply produces 78 spaces. You may want to just replace it with a long line of spaces.

Upvotes: 1

Related Questions