Reputation: 1
Let's say that I have two files, file 1 has the original data and file 2, which was recently created, has updated values for some of the data that need to be replaced into file 1.
Here's an example of the format for file 1 which contains 10000 lines of data as shown below:
1000001 aaaaaaa aaaaaaa 123
1000002 aaaaaab aaaaaab 123
.
.
.
1000503 xxxxxxa xxxxxxa 123
.
.
.
1010000 zzzzzzl zzzzzzl 123
File 2 contains 1054 lines with updated values in the same format as file 1, however many of them are not continuous. For example line 1000503 in file 2 would read as follows:
1000503 xxxxxxb xxxxxxb 245
Upvotes: 0
Views: 221
Reputation: 29090
This will do it, using join
and awk
, assuming the files are in key order:
join -a1 -j1 file1 file2 \
| awk '{ if (NF > 4) print $1, $5, $6, $7; else print $0 }'
This accomplishes the job in two passes.
First, join
takes two text files that have a common column (in this case, your leading number column) and "joins" them like a database join. By default, it prints all lines for which the key appears in both files. -j1
tells it to join on the first field. -a1
tells it to print all lines from file 1, even if they don't have corresponding lines in file 2.
join
does have the restriction that both files are sorted by key.
This results in a copy of file 1 that also includes the matching lines from file 2, like so:
1000001 aaaaaaa aaaaaaa 123
1000002 aaaaaab aaaaaab 123
.
.
.
1000503 xxxxxxa xxxxxxa 123 xxxxxxb xxxxxxb 245
.
.
.
1010000 zzzzzzl zzzzzzl 123
We now have a problem: the matched line contains data from both files. This isn't what we want, though: we want it to replace the data in file 1. I couldn't see a way to make join
do this on its own, so awk
to the rescue.
The Awk code is pretty simple. If the number of fields (the NF
variable) is greater than 4, we have a joined line; in that case, print fields 1, 5, 6, and 7. Otherwise, print the whole original line (since it is unjoined).
This will emit each unmatched line unmodified, and the file2 version of each matched line.
Upvotes: 1
Reputation: 97918
Using sed:
join -a 1 in1 in2 | sed 's/^\([0-9]*\) [^ ]* [^ ]* [^ ]* /\1 /'
Upvotes: 1