Awk--Editing field values in two files based on another file's content

Question

I have three files A.txt, B.txt and C.txt. Two files A.txt and B.txt have the same number of lines with only one field in each. Like:

A.txt

m.1
m.2
m.33
m.5
m.4
m.6

B.txt

A
B
CC
D
CC
G

and C.txt is a two column file in which each line consists of elements from A.txt. Somethong Like:

C.txt

m.1 m.33
m.2 m.6
m.33 m.4
m.5 m.7
m.4 m.823
m.6 m.2

What I need to do is to first check lines of B.txt and if a line has either "G" or "CC", replace the corresponding line in A.txt as well as corresponding variables in C.txt. Like:

A.txt

m.1
m.2
CC.33
m.5
CC.4
G.6

C.txt

m.1 CC.33
m.2 G.6
CC.33 CC.4
m.5 m.7
CC.4 m.823
G.6 m.2

Tom Fenech · Accepted Answer

This awk script does what you want:

BEGIN { FS="[[:space:].]+" }

NR == FNR {    
    if ($3 ~ /CC|G/) { $0 = $3 "." $2; swap[$0]++ }
    else $0 = $1 "." $2
    print > "A_new.txt"
    next
}

{
    for (i=2; i<=NF; i+=2) {
        for (key in swap) {
            split(key, k)            
            if ($i == k[2]) {
                $(i-1) = k[1]
                $i = k[2]
            }            
        }
        $(i/2) = $(i-1) "." $i
    }    
    print $1, $2 > "C_new.txt"
}

Run it like this:

awk -f merge.awk <(paste A.txt B.txt) C.txt

The first block operates on the first input. I have used paste to combined A.txt and B.txt, so the input looks like this:

$ paste A.txt B.txt
m.1     A
m.2     B
m.33    CC
m.5     D
m.4     CC
m.6     G

The script is similar to the first version with a few tweaks. I have removed the previous explanation because some of it is no longer applicable. Hopefully it reads fairly clearly anyway.

$ cat A_new.txt 
m.1
m.2
CC.33
m.5
CC.4
G.6
$ cat C_new.txt 
m.1 CC.33
m.2 G.6
CC.33 CC.4
m.5 m.7
CC.4 m.823
G.6 m.2

Awk--Editing field values in two files based on another file's content

Answers (1)

Related Questions

Awk--Editing field values in two files based on another file&#39;s content

Answers (1)

Related Questions

Awk--Editing field values in two files based on another file's content