lfc_07
lfc_07

Reputation: 37

Replace in one file with value from another file not working properly

I have two files. A mapping file and an input file.

cat map.txt

test:replace

cat input.txt

The word test should be replaced.But the word testbook should not be replaced just because it has "_test" in it.

Using the below command to find in the file and replace it with value in mapping file.

awk 'FNR==NR{ array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' FS=":" map.txt FS=" " input.txt

what it does is, searches for the text which are mentioned in map.txt and replace with the word followed after " : " in the same input file. In the above example "test" with "replace".

Current result:

The word replace should be replaced.But the word replacebook should not be replaced just because it has _replace in it.

Expected Result:

The word replace should be replaced.But the word testbook should not be replaced just because it has "_test" in it.

so what i need is only if that word alone is found it has to be replaced. If that word has any other character clubbed then it should be ignored.

Any help is appreciated.

Thanks in advance.

Upvotes: 0

Views: 196

Answers (2)

James Brown
James Brown

Reputation: 37454

for loop all the words and replace where needed:

$ awk '
NR==FNR {                     # hash the map file
    a[$1]=$2
    next
}
{
    for(i=1;i<=NF;i++)        # loop every word and if it s hashed, replace it
        if($i in a)           # ... and if it s hashed...
            $i=a[$i]          # replace it
}1
' FS=":" map FS=" " input
The word replace should be replaced.But the word testbook should not be replaced just because it has "_test" in it.

Edit: Using match to extract words from strings to preserve punctuations:

$ cat input2
Replace would Yoda test.
$ awk '
NR==FNR {                     # hash the map file
    a[$1]=$2
    next
}
{
    for(i=1;i<=NF;i++) {
        # here should be if to weed out obvious non-word-punctuation pairs
        # if($i ~ /^[a-zA-Z+][,\.!?]/)
        match($i,/^[a-zA-Z]+/)       # match from beginning of word. ¿correct?
        w=substr($i,RSTART,RLENGTH)  # extract word
        if(w in a)                   # match in a
            sub(w,a[w],$i)
    }
}1' FS=":" map FS=" " input
Replace would Yoda replace.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204258

With GNU awk for word boundaries:

awk -F':' '
NR==FNR { map[$1] = $2; next }
{
    for (old in map) {
        new = map[old]
        gsub("\\<"old"\\>",new)
    }
    print
}
' map input

The above will fail if old contains regexp metacharacters or escape characters or if new contains & but as long as both use word consituent characters it'll be fine.

Upvotes: 1

Related Questions