Reputation: 1

How can I use awk to replace data in one file with updated data in a second file in a unix shell?

Let's say that I have two files, file 1 has the original data and file 2, which was recently created, has updated values for some of the data that need to be replaced into file 1.

Here's an example of the format for file 1 which contains 10000 lines of data as shown below:

1000001 aaaaaaa aaaaaaa 123
1000002 aaaaaab aaaaaab 123
.
.
.
1000503 xxxxxxa xxxxxxa 123
.
.
.
1010000 zzzzzzl zzzzzzl 123

File 2 contains 1054 lines with updated values in the same format as file 1, however many of them are not continuous. For example line 1000503 in file 2 would read as follows:

1000503 xxxxxxb xxxxxxb 245

Upvotes: 0

Answers (2)

Michael Ekstrand

Reputation: 29090

This will do it, using join and awk, assuming the files are in key order:

join -a1 -j1 file1 file2 \
    | awk '{ if (NF > 4) print $1, $5, $6, $7; else print $0 }'

This accomplishes the job in two passes.

Combining the files

First, join takes two text files that have a common column (in this case, your leading number column) and "joins" them like a database join. By default, it prints all lines for which the key appears in both files. -j1 tells it to join on the first field. -a1 tells it to print all lines from file 1, even if they don't have corresponding lines in file 2.

join does have the restriction that both files are sorted by key.

This results in a copy of file 1 that also includes the matching lines from file 2, like so:

1000001 aaaaaaa aaaaaaa 123
1000002 aaaaaab aaaaaab 123
.
.
.
1000503 xxxxxxa xxxxxxa 123 xxxxxxb xxxxxxb 245
.
.
.
1010000 zzzzzzl zzzzzzl 123

Splitting the fields

We now have a problem: the matched line contains data from both files. This isn't what we want, though: we want it to replace the data in file 1. I couldn't see a way to make join do this on its own, so awk to the rescue.

The Awk code is pretty simple. If the number of fields (the NF variable) is greater than 4, we have a joined line; in that case, print fields 1, 5, 6, and 7. Otherwise, print the whole original line (since it is unjoined).

This will emit each unmatched line unmodified, and the file2 version of each matched line.

Upvotes: 1

perreal

Reputation: 97918

Using sed:

join -a 1 in1 in2 | sed 's/^\([0-9]*\) [^ ]* [^ ]* [^ ]* /\1 /'

Upvotes: 1

How can I use awk to replace data in one file with updated data in a second file in a unix shell?

Answers (2)

Combining the files

Splitting the fields

Related Questions