Reputation: 637
I have an input file, a reference file, and a script. The script reads parameters in the reference file; then it scans the input file for the same parameters and replaces their values if the values are different.
Whenever the script replaces values in the input file, it records the previous value with a timestamp, then writes the new value below it in a newline. This gives me a history of meaningful updates to parameters.
Input File (test.txt)
testx=1
# testy=2
#testz=3
foobar=2
path=/data/me/testing
plainpath=/data/me/stack
testw=4
Reference File (ref.txt)
foobar=10
path=$MY_HOME
plainpath=/data/you/stack
testy=stack
script (script.sh) - Thanks to Ed Morton for the formatting
#!/bin/bash
Timestamp=$(date '+%Y%m%d_%H:%M:%S')
myhome=/data/stack/testing
awk -F= -v stamp="$Timestamp" '
(NR == FNR && /=/ && !/^#/) {
for (j = 2; j < NF; j++) {
a[$1] = a[$1] $j "="
}
a[$1] = a[$1] $NF
}
(NR != FNR && $1 in a && $1 > 0) {
if ($2 !~ a[$1]) {
$0 = "###EDITED_ON " stamp " from " $2 " to\n" $1 "=" a[$1]
}
}
(NR != FNR && /^#[ a-zA-Z]/) {
b = $1
sub(/# */, "", b)
if (b in a) {
$0 = b "=" a[b]
}
}
(NR != FNR) {
print
}
prep.txt test.txt > tmp && mv tmp test.txt
sed -i 's,$MY_HOME,'"$myhome"',g' test.txt
Brief explanation of script (potentially skippable)
For NR==FNR, awk stores the variables in ref.txt in a hash 'a' with the variable name as key and the variable value as the paired value.
For NR != FNR, awk is scanning test.txt. It compares $1, a variable name, to check whether it's a key in the hash. If it's in the hash, it replaces the line with two lines. The first line has a timestamp and the old value. The second line has the parameter with the new value.
There is one additional NR != FNR block to account for parameters which are commented out. I have not written them to write a timestamp history here for simplicity of presentation.
Target code line
if($2 !~ a[$1])
This if conditions means that the code will not write if the replacement value is already the same as the value in the input file. This should ensure I only see meaningful updates from the script. Unfortunately, this is the line that is returning a false positive for matching strings inserted by sed.
Problem
There is a sed line below the awk code which substitutes variables. This allows me to conveniently write ref.txt with variables for frequently occuring values, and then substitute them all in one step at the end.
For some reason, when I substitute with sed, something about sed changes the nature of the inserted string. Even if the replacement value is the same, if I run the script a second time, awk will replace it and enter a timestamp for the new edit. It's making redundant updates. Here is the output after running the script twice:
testx=1
testy=stack
#testz=3
###EDITED_ON 20200702_11:35:42 from 2 to
foobar=10
###EDITED_ON 20200702_11:35:42 from /data/me/testing to
###EDITED_ON 20200702_11:35:46 from /data/stack/testing to
path=/data/stack/testing
###EDITED_ON 20200702_11:35:42 from /data/me/stack to
plainpath=/data/you/stack
testw=4
Notice that "plainpath" and "foobar" are not edited any further. Path, however, which was defined in ref.txt by the variable $MY_HOME and substituted out by sed, is continually updated with the same value. I can run this infinitely, and it will always update this line.
Eliminating the sed line isn't a crisis for my project, but I am interested in why sed and awk interact this way.
Questions
Thank you very much!
Upvotes: 1
Views: 183
Reputation: 6144
If your script considers both path values as different...
###EDITED_ON 20200702_11:35:46 from /data/stack/testing to
path=/data/stack/testing
... then you ought to assume that what is after the equal sign (the field separator in your awk script) is indeed different, but the actual problem is that you don't see the difference.
If you can't see it, it is probably because the difference is made of space characters at the end of the line, for example "
", tabulation or even carriage return (CR) if your file was edited on Windows (lines end with CR+LF on Windows but Unix only considers the LF char).
Use an hex editor or cat -A
to see what is hidden to your eyes.
Upvotes: 0