Reputation: 61
My file looks like this:
Scenario 1 0.20 0.00 0.00 r
Scenario 2 0.08 0.34 & 0.34 r
Scenario 3 6 12.95
Scenario 4 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 0.07 0.08 & 0.42 r
Scenario 6 6 8.70
Scenario 7 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 0.31 0.28 & 0.70 f
Scenario 9 5 5.06
My objectives is: To replace columns with empty cells/spaces/absent values with "-" (there are a total of 8 fields)
The problem I'm facing while using the awk command to do this is that the field separator keeps changing with every line.
What I've done so far: I've extracted the lines which have certain field patterns and placed them in different files. Eg: I have placed Scenario 3,6 and 9 in one file and the rest in another file to make it easier to work on the data. What I have now is:
File 1:
Scenario 3 6 12.95
Scenario 6 6 8.70
Scenario 9 5 5.06
File 2:
Scenario 1 0.20 0.00 0.00 r
Scenario 2 0.08 0.34 & 0.34 r
Scenario 4 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 0.07 0.08 & 0.42 r
Scenario 7 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 0.31 0.28 & 0.70 f
Expected output:
Scenario 1 - - 0.20 - 0.00 0.00 r
Scenario 2 - - 0.08 - 0.34 & 0.34 r
Scenario 3 6 12.95 - - - -
Scenario 4 - 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 - - 0.07 - 0.08 & 0.42 r
Scenario 6 6 8.70 - - - -
Scenario 7 - 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 - - 0.31 0.28 & 0.70 f
Scenario 9 5 5.06 - - - -
Case 1(using awk with FIELDWIDTHS):
$ awk 'BEGIN { FIELDWIDTHS="37 3 7 7 7 9 9 "} {for(i=1;i<=NF;++i){printf $i"|"};print""}' main1.txt
| I_BLENDER_0/R_137/CLK (SDFFX2_HVT) | | | 0.20 | | 0.00 | 0.00 r
| I_BLENDER_0/R_137/Q (SDFFX2_HVT) | | | 0.08 | | 0.34 & | 0.34 r
| I_BLENDER_0/n2757 (net) | 6 | 12.95|
| I_BLENDER_0/U4847/A1 (AND2X1_LVT) | | 0.00 | 0.08 | 0.00 | 0.00 & | 0.35 r
| I_BLENDER_0/U4847/Y (AND2X1_LVT) | | | 0.07 | | 0.08 & | 0.42 r
| I_BLENDER_0/n2616 (net) | 6 | 8.70 |
| I_BLENDER_0/U1/A4 (NAND4X0_HVT) | | 0.00 | 0.07 | 0.00 | 0.00 & | 0.42 r
| I_BLENDER_0/U1/Y (NAND4X0_HVT) | | | 0.31 | | 0.28 & | 0.70 f
Case 2(using sed command):
$ sed "s/^\(.\{,36\}\)$/\1`echo -$_{1..30}|tr -d '-'`/;
s/^\(.\{38\}\) /\1-/;
s/^\(.\{43\}\) /\1-/;
s/^\(.\{50\}\) /\1-/;
s/^\(.\{57\}\) /\1-/;
s/^\(.\{64\}\) /\1-/;
s/^\(.\{73\}\) /\1-/;
s/ *$//"
I_BLENDER_0/R_137/CLK (SDFFX2_HVT) - - 0.20 - 0.00 0.00 r
I_BLENDER_0/R_137/Q (SDFFX2_HVT) - - 0.08 - 0.34 & 0.34 r
I_BLENDER_0/n2757 (net) 6 12.95
I_BLENDER_0/U4847/A1 (AND2X1_LVT) - 0.00 0.08 0.00 0.00 & 0.35 r
I_BLENDER_0/U4847/Y (AND2X1_LVT) - - 0.07 - 0.08 & 0.42 r
I_BLENDER_0/n2616 (net) 6 8.70
Upvotes: 1
Views: 508
Reputation: 26481
To do this, you can make use of FIELDWIDTHS
in Gnu awk:
Basically, we split your lines in constant width fields. The following shows that the lines are split correctly:
$ awk 'BEGIN{ FIELDWIDTHS="13 25 2 7 7 7 9 9"}
{for(i=1;i<=NF;++i){printf $i"|"};print""}' file
Scenario 1 | | | | 0.20 | | 0.00 | 0.00 r|
Scenario 2 | | | | 0.08 | | 0.34 & | 0.34 r|
Scenario 3 | | 6 | 12.95| ||||
Scenario 4 | | | 0.00 | 0.08 | 0.00 | 0.00 & | 0.35 r|
Scenario 5 | | | | 0.07 | | 0.08 & | 0.42 r|
Scenario 6 | | 6 | 8.70 |||||
Scenario 7 | | | 0.00 | 0.07 | 0.00 | 0.00 & | 0.42 r|
Scenario 8 | | | | 0.31 | | 0.28 & | 0.70 f|
Scenario 9 | | 5 | 5.06 |||||
So all we need to do is replace the empty fields with the dash if needed.
$ awk 'BEGIN{ FIELDWIDTHS="13 24 3 7 7 7 9 9"}
{s=$1$2}
{s=s ($3~/^[[:blank:]]*$/?" - ":$3)}
{s=s ($4~/^[[:blank:]]*$/?" - ":$4)}
{s=s ($5~/^[[:blank:]]*$/?" - ":$5)}
{s=s ($6~/^[[:blank:]]*$/?" - ":$6)}
{s=s ($7~/^[[:blank:]]*$/?" - ":$7)}
{s=s ($8~/^[[:blank:]]*$/?" - ":$8)}
{print s}' file
and this gives:
Scenario 1 - - 0.20 - 0.00 0.00 r
Scenario 2 - - 0.08 - 0.34 & 0.34 r
Scenario 3 6 12.95 - - - -
Scenario 4 - 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 - - 0.07 - 0.08 & 0.42 r
Scenario 6 6 8.70 - - - -
Scenario 7 - 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 - - 0.31 - 0.28 & 0.70 f
Scenario 9 5 5.06 - - - -
remarks:
%-5.2f
. This is why the number 12.95
is not aligned. (%6.2f
would have been better)note: if you play a bit around, you can actually do it shorter. But you sort of lose the feeling of what is going on.
awk 'BEGIN{ FIELDWIDTHS="13 23 5 7 7 7 9 9"}
{for(i=3;i<=NF;++i)$i=$i~/^[[:blank:]]*$/?" -":$i}
{printf "%-13s%-23s%-5s%-7s%-7s%-7s%-9s%-9s\n",$1,$2,$3,$4,$5,$6,$7,$8}' file
or even shorter
awk 'BEGIN{ FIELDWIDTHS="36 5 7 7 7 9 9"; split(FIELDWIDTHS,a)}
{for(i=1;i<=NF;++i) printf "%-*s",a[i], ($i~/^ *$/?" -":$i); print ""}'
Upvotes: 1
Reputation: 13249
Using GNU awk and FIELDWIDTHS
variable to split fields based on their length:
awk 'BEGIN{
FIELDWIDTHS="38 4 7 7 7 9 6"
colnr=split(FIELDWIDTHS,a," ")
}
{
for(i=1;i<=colnr;i++){
$i=sprintf("%-"a[i]"s",((!$i&&$i!=0)||$i~/^ *$/?"-":$i))
}
}1' file
Scenario 1 - - 0.20 - 0.00 0.00 r
Scenario 2 - - 0.08 - 0.34 & 0.34 r
Scenario 3 6 12.95 - - - -
Scenario 4 - 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 - - 0.07 - 0.08 & 0.42 r
Scenario 6 6 8.70 - - - -
Scenario 7 - 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 - - 0.31 - 0.28 & 0.70 f
Scenario 9 5 5.06 - - - -
The BEGIN
block sets the array a
with the length of all fields, and store the number of fields in the variable colnr
.
The default block loops through all fields and rewrites them with the sprintf()
function.
If the field contains only blanks $i~/^ *$/
or doesn't exist !$i&&$i!=0
, replace it with a -
. If not, the field remains untouched.
Upvotes: 1
Reputation: 108
Unfortunately, in this case you need to carefully count the character columns. Here is the code for the input that you provided -- you may need to adjust the numbers for your real input file.
sed "s/^\(.\{,78\}\)$/\1`echo -$_{1..78}|tr -d '-'`/;
s/^\(.\{38\}\) /\1-/;
s/^\(.\{43\}\) /\1-/;
s/^\(.\{50\}\) /\1-/;
s/^\(.\{57\}\) /\1-/;
s/^\(.\{64\}\) /\1-/;
s/^\(.\{73\}\) /\1-/;
s/ *$//" input_file
Here, the first line adds spaces in the end of line in case the line terminates before reaching 78 characters -- this is then exploited by the substitution. In the end of chain substitutions, any trailing space is removed.
The messy-looking expression echo -$_{1..78}|tr -d '-'
in the first line simply produces 78 spaces. You may want to just replace it with a long line of spaces.
Upvotes: 1