Reputation: 235
I have a file as following:
2300
10 1112221234 111222123420231121PPPPD10+0000000850 ESIM
10 3334446789 333444678920231121PPPPD11+0000000950 RSIM
23
I want the outcome to be as following:
2300
10 1112222345 111222234520231121PPPPD10+0000000850 ESIM
10 3334447890 333444789020231121PPPPD11+0000000950 RSIM
23
I tried with the following code and was able to replace the last 4 digits in the second column and the last 4 digits before the date in the third column. But it also removed extra spaces as well as alphabets/numbers from 11th digit onwards in the third column and got the following:
2300
10 1112222345 1112222345 ESIM
10 3334447890 3334447890 RSIM
23
awk '
BEGIN { FS=OFS=" " }
{if(length($2)>9 && length($3)>9)
{$2 = substr($2,-10)
$3 = substr($3,1,10)
for (i=2;i<=3;i++) {
str = substr($i, 1, length($i) - 4)
for (j = length($i) - 3; j <= length($i); j++) {
str = str (substr($i, j, 1) + 1) % 10
}
$i = str
}
}}
1' filename
Upvotes: 11
Views: 1127
Reputation: 29345
It is not clear if the characters to replace in the third field are always characters 7 to 10 (CASE 1) or if the third field always starts with digits, and the date part is always the last 8 digits before the first non-digit character, as in your example (CASE 2). Let's deal with both.
Your problem comes from the fact that you update the fields, which forces awk
to recompute the record using the output field separator (OFS
), that is, a single space, instead of the original separators. Moreover, you overwrite $2
and $3
with the substr(...)
results to keep only 10 characters, discarding the others, which is not what you want.
To not discard parts of the second and third fields, well... don't discard them. To preserve the original field separators there are several options but the easiest to understand and design is probably to update the complete record ($0
), instead of individual fields. Example for CASE 1:
awk 'length($2)>9 && length($3)>9 {
match($0,/^([[:space:]]*[^[:space:]]+){2}/); a[1]=RLENGTH-3
match($0,/^([[:space:]]*[^[:space:]]+){2}[[:space:]]+/); a[2]=RLENGTH+7
for(i=1; i<=2; i++) for(j=a[i]; j<a[i]+4; j++)
$0=substr($0,1,j-1) (substr($0,j,1)+1)%10 substr($0,j+1)
} 1' filename
2300
10 1112222345 111222234520231121PPPPD10+0000000850 ESIM
10 3334447890 333444789020231121PPPPD11+0000000950 RSIM
23
Explanations: we use match
to find the index of the last character of the second field and of the last whitespace before the third field. We then adjust these to point to the first character to replace in the two substitutions (a[1]
and a[2]
).
Example for CASE 2:
awk 'length($2)>9 && $3~/^[0-9]{12}/ {
match($0,/^([[:space:]]*[^[:space:]]+){2}/); a[1]=RLENGTH-3
match($0,/^([[:space:]]*[^[:space:]]+){2}[[:space:]]+[0-9]+/); a[2]=RLENGTH-11
for(i=1; i<=2; i++) for(j=a[i]; j<a[i]+4; j++)
$0=substr($0,1,j-1) (substr($0,j,1)+1)%10 substr($0,j+1)
} 1' filename
Explanations: we modify the pattern part to retain only records which third field has at least 12 leading digits (the 4 digits to replace plus an 8 digits date). Same as before for the second field, but for the third field we search the last leading digit and adjust by -11
to point to the first digit to replace.
If your awk
is GNU awk
we can replace [[:space:]]
with \s
and [^[:space:]]
with \S
. But we can do even better, with only one match
, thanks to the optional third argument of the GNU awk
version of match
: an array in which GNU awk
stores the capture groups.
Example for CASE 1:
awk 'length($2)>9 && length($3)>9 {
match($0,/^(\s*\S+\s+\S+)(\S{3}\s+\S{7})/,b)
a[1]=length(b[1]); a[2]=a[1]+length(b[2])
for(i=1; i<=2; i++) for(j=a[i]; j<a[i]+4; j++)
$0=substr($0,1,j-1) (substr($0,j,1)+1)%10 substr($0,j+1)
} 1' filename
With the same CASE 1 FPAT
is another interesting GNU awk
feature that allows to redefine the fields such that they contain also the following separator. This probably leads to the simplest of all solutions:
awk -v FPAT='[^[:space:]]+[[:space:]]*' '$2~/^\S{10}/ && $3~/^\S{10}/ {
for(i=2; i<=3; i++) for(j=7; j<=10; j++)
$i=substr($i,1,j-1) (substr($i,1,j)+1)%10 substr($i,j+1)
} 1' filename
Example for CASE 2:
awk 'length($2)>9 && $3~/^[0-9]{12}/ {
match($0,/^(\s*\S+\s+\S+)(\S{3}\s+[0-9]+)[0-9]{11}/,b)
a[1]=length(b[1]); a[2]=a[1]+length(b[2])
for(i=1; i<=2; i++) for(j=a[i]; j<a[i]+4; j++)
$0=substr($0,1,j-1) (substr($0,j,1)+1)%10 substr($0,j+1)
} 1' filename
Note: in all examples, except the one using FPAT
, the regular expressions used in match
match lines with leading whitespaces (spaces, TABs and newlines). Remove the leading [[:space:]]*
or \s*
if you want to skip lines with leading whitespaces.
Note: FS=OFS=" "
is already the default, so in your own code the BEGIN
block is useless.
Upvotes: 0
Reputation: 26695
If you capture each 'part of interest' from columns $2 and $3, then increment the 4 digits, then use printf
to print the lines, you can get your desired outcome, e.g.
awk 'BEGIN {
FS = OFS = " "
}
{
if (length($2) > 9 && length($3) > 9) {
col2_first_part = substr($2, 0, 6)
col2_4_digits = substr($2, 7, 4)
col3_first_part = substr($3, 0, 6)
col3_4_digits = substr($3, 7, 4)
col3_last_part = substr($3, 11, length($3) - 10)
printf "%s\t%s", $1, col2_first_part
for (i = 1; i <= 4; i++) {
printf "%s", (substr(col2_4_digits, i, 1) + 1) % 10
}
printf "\t%s", col3_first_part
for (j = 1; j <= 4; j++) {
printf "%s", (substr(col3_4_digits, j, 1) + 1) % 10
}
printf "%s\t", col3_last_part
for (k = 4; k <= NF; k++) {
printf "%s%s", $k, (k < NF ? "\t" : "\n")
}
} else {
print
}
}' filename
2300
10 1112222345 111222234520231121PPPPD10+0000000850 ESIM
10 3334447890 333444789020231121PPPPD11+0000000950 RSIM
23
Upvotes: 4
Reputation: 35306
Assumptions:
old
) is the entire 2nd columnold
is also the prefix of the 3rd columnold
only shows up twice in a line (as 2nd column, as prefix of 3rd column)One awk
idea:
awk '
NF==4 { old = $2
len = length(old)
new = substr(old,1,len-4)
for (i=len-3; i<=len; i++)
new = new ((substr(old,i,1)+1) % 10)
gsub(old,new) # replace both instances of "old" with "new"
}
1
' filename
This generates:
2300
10 1112222345 111222234520231121PPPPD10+0000000850 ESIM
10 3334447890 333444789020231121PPPPD11+0000000950 RSIM
23
Upvotes: 4
Reputation: 133750
In GNU awk
please try following GNU awk
code. Written and tested with shown samples.
awk -v OFS="\t" '
match($2,/(.*)([0-9])([0-9])([0-9])([0-9])$/,arr){
if(arr[3]==9) { val1=(arr[2] arr[3]) + 1 }
if(arr[5]==9) { val2=(arr[4] arr[5]) + 1 }
if(val1 && !val2) { $2= arr[1] val1 arr[4]+1 arr[5]+1 }
if(val2 && !val1) { $2 = arr[1] arr[2]+1 arr[3]+1 val2 }
if(val1 && val2) { $2 = arr[1] val1 val2 }
if(!val1 && !val2){ $2 = arr[1] arr[2]+1 arr[3]+1 arr[4]+1 arr[5]+1 }
}
match($3,/(^.{6})([0-9])([0-9])([0-9])([0-9])(.*$)/,arr){
if(arr[3]==9) { val1=(arr[2] arr[3]) + 1 }
if(arr[5]==9) { val2=(arr[4] arr[5]) + 1 }
if(val1 && !val2) { $3= arr[1] val1 arr[4]+1 arr[5]+1 arr[6] }
if(val2 && !val1) { $3 = arr[1] arr[2]+1 arr[3]+1 val2 arr[6] }
if(val1 && val2) { $3 = arr[1] val1 val2 arr[6] }
if(!val1 && !val2){ $3 = arr[1] arr[2]+1 arr[3]+1 arr[4]+1 arr[5]+1 arr[6] }
}
1
' Input_file | column -t
Upvotes: 9