user3057111
user3057111

Reputation: 97

Search replace string in a file based on column in other file

If we have the first file like below:

(a.txt)
 1  asm
 2  assert
 3  bio
 4  Bootasm
 5  bootmain
 6  buf
 7  cat
 8  console
 9  defs
10  echo

and the second like:

(b.txt)
 bio cat BIO bootasm
 bio defs cat
 Bio console 
 bio BiO
 bIo assert
 bootasm asm
 bootasm echo
 bootasm console
 bootmain buf
 bootmain bio
 bootmain bootmain
 bootmain defs
 cat cat
 cat assert
 cat assert

and we want the output will be like this:

 3 7 3 4
 3 9 7
 3 8
 3 3
 3 2
 4 1
 4 10
 4 8
 5 6
 5 3
 5 5
 5 9
 7 7
 7 2
 7 2

we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.

Here the command i use awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt

3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert

how should i do to the other columns ?

Thankyou

Upvotes: 5

Views: 2104

Answers (3)

perreal
perreal

Reputation: 97918

awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt

NR is the global record number (line number default) across all files. FNR is the line number for the current file. The NR==FNR block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.

First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i] actually replaces the word at the ith column with its id. Finally the 1 at the end of the scripts causes all lines to be printed out.

Produces:

3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2

To make the script case-insensitive, add tolower calls into the array indices:

awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt

Upvotes: 9

NeronLeVelu
NeronLeVelu

Reputation: 10039

{
 cat a.txt;  echo "--EndA--";cat b.txt
} | sed -n '1 h
1 !H
$ {
  x
: loop
  s/^ *\([[:digit:]]\{1,\}\) *\([^[:cntrl:]]*\)\(\n\)\(.*\)\2/\1 \2\3\4\1/
  t loop
  s/^ *[[:digit:]]\{1,\} *[^[:cntrl:]]*\n//
  t loop
  s/^[[:space:]]*--EndA--\n//
  p
  }
 '

"--EndA--" could be something else if chance that it will present in one of the file (a.txt mainly)

Upvotes: 3

Gery
Gery

Reputation: 9036

divide and conquer!, a bit archaic but does the job =)

awk 'NR==FNR{a[$2]=$0;next}{$1=a[$1];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 1
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$2];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 2
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$3];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 3
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$4];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 4
paste 1 2 3 4 | tr '\t' ' '

gives:

3 7 3 4
3 9 7 
3 8  
3 3  
3 2  
4 1  
4 10  
4 8  
5 6  
5 3  
5 5  
5 9  
7 7  
7 2  
7 2  

in this case I just changed the number of columns and paste the results together with a bit of edition in between.

Upvotes: 3

Related Questions