user1540393
user1540393

Reputation: 69

How to replace a particular value of a field (column) in a file with a value from another file in shell scripting?

I have two files say A.txt and B.txt. A.txt has three columns which looks like below

0 0 17
0 1 17
0 2 4
0 3 50
0 4 90 
....
.... 

I have to replace the third column values with their corresponding map values which are saved in B.txt which looks like below

1 1
2 1
3 1
4 1
..
17 5
..
50 8
..
90 11
..

The values of the first column in B.txt and the values of the third column in A.txt are the same and I need to create a new file(say C.txt) whose first two columns are the same as that of A.txt but the third column contains the corresponding map values. Sample of C.txt appears as below

0 0 5, 0 1 5, 0 2 1, 0 3 8, 0 4 11, ..., ...

NOTE

I have 400000 files to do this operation so speed matters. I have written a program for this but thats running very slow. If instead of creating new file(C.txt) replacement saves time that solution is also acceptable.

while read line
do

     origPhoneme=`echo $line| cut -d " " -f3` 
     while read mapLine
     do
        mapPhone=`echo $mapLine | cut -d " " -f1`
        replacementPhone=`echo $mapLine | cut -d " " -f2`
        if [ $mapPhone == $origPhoneme ]
        then
             echo $replacementPhone >> checkFile
             break
        fi
     done < B.txt
done< A.txt

paste -d " " A.txt checkFile > C.txt

By using this code the C.txt file contains the third column of A.txt which I dont want

Upvotes: 1

Views: 987

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336468

Python (or shell scripts) should be fast enough - your task is mainly limited by I/O speed, not processing speed.

So I would suggest a Python approach like this:

Read B.txt into a dictionary for fast lookup:

with open("B.txt") as file:
    B = dict(line.strip().split() for line in file)

Then process A.txt, creating C.txt:

with open("A.txt") as infile, open("C.txt", "w") as outfile:
    for line in infile:
        start, end = line.strip().rsplit(None, 1)
        outfile.write("{0} {1}\n".format(start, B[end]))

Upvotes: 4

Related Questions