Reputation: 33
I want to find common lines in two files and replace the next line of the first file with the next line of second file. Sed, awk, Perl, Bash, any solution is welcomed.
The comparison is case-insensitive and there can be multiple occurrences of the same line.
File 1:
hgacdavd
sndm,ACNMSDC
msgid "Rome"
msgstr ""
kgcksdcgfkdsb
msgid ""
hsdvchgsdvc
msgstr ""
dhshfjksdfhmd
msgid "Vidya"
msgstr ""
sdjhcbnd
dcndnv
cfnkdndvrknvkf
dfkvrnkdfnk
snfvrkng
msgid "Rome"
msgstr ""
wdbhkjbcfj
#dmcdmf
f,nvdf,
fvnfnvk
vfmf,mv
vfn
msgid "vid"
msgstr ""
dmcbdmbcvmfbvmkhsdk
File 2:
dfhkvgjbfrvkf
msgid "Rome"
msgstr "new bie"
sdbsjbcdcbwoido
fjcdcvnm
msgid "vidya"
msgstr "expert"
dvnjfkdvhnkfvnknsbdjh
msgid "vid"
msgstr "newton"
dfenfjdbrfjbvlfnvl
dcnkncvkdfvknfv
fcndkbvknfkv
vfdnkvnfknbvkfn
Later, file 1 should be:
hgacdavd
sndm,ACNMSDC
msgid "Rome"
msgstr "new bie"
kgcksdcgfkdsb
msgid ""
hsdvchgsdvc
msgstr ""
dhshfjksdfhmd
msgid "Vidya"
msgstr "expert"
sdjhcbnd
dcndnv
cfnkdndvrknvkf
dfkvrnkdfnk
snfvrkng
msgid "Rome"
msgstr "new bie"
wdbhkjbcfj
#dmcdmf
f,nvdf,
fvnfnvk
vfmf,mv
vfn
msgid "vid"
msgstr "newton"
dmcbdmbcvmfbvmkhsdk
Upvotes: 0
Views: 189
Reputation: 124
Check my source code here! https://github.com/NguyenDuyPhong/merge_two_po_files Connect to main page by the link: .../merge_two_po_files/web (Ex: http://localhost/merge_two_po_files/web )
Upvotes: 0
Reputation: 113834
In this version of the question, the lines with msgid and msgstr appear in pairs and are separated from other lines with a blank line. Here is a one (long) line solution for this case:
$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""} NR==FNR {c[tolower($2)]=$4; next} {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1
msgid "Rome"
msgstr "new bie"
msgid "Vidya"
msgstr "expert"
msgid "Rome"
msgstr "new bie"
msgid "vid"
msgstr "newton"
MORE: The version below updates file1 with the new information:
$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""} NR==FNR {c[tolower($2)]=$4; next} {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1 >tmp && mv tmp file1
How it works: Let's break it down into parts. The first part is just setup:
$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""}
To understand the above, one needs to know that awk
breaks a file up into 'records', and then it breaks records up into 'fields'. The above says that every time a blank line appears (two newline characters in a row), treat what follows as a new 'record'. In other words the record separator is two newlines: RS="\n\n"
. It also says that records should be broken up into fields according to the appearance of a double-quote: -F'"'
. Finally, it says that, when printing our output, do not add any additional spaces to what we have. In other words, the output field separator is the empty string: OFS=""
The next part is:
... NR==FNR {c[tolower($2)]=$4; next} ...
This says that, while reading the first file name given (file2
), create an associate array (like a dictionary) called c
. The keys for the array are the msgid's. The values are the msgstr's. Thus, c[rome]=new bie
. We apply tolower
to the msgid's so that all the keys are consistently lower case. The next
command means that.
The NR==FNR
part above probably looks obscure. To understand it, one needs to know that awk
counts the number of records that it has seen and assigns that value to NR
. It also counts the number of records that it has seen from the file that it is currently reading and assigns that to FNR
. So, when we are reading the first file, it follows that the two are equal: NR==FNR
. When awk
starts reading the second file, then NR>FNR
and the block of code will be skipped.
The last part is:
... {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1
This part is executed when we start reading the second file (file1
) and consists of just one print statement. It prints the msgid line, including the quotes around the msgid: $1,"\"",$2,"\""
. And, it also prints out the msgstr, looking up the value of the msgstr in the our associative array c
and putting quotes around it and a newline character at the end: "\"",$3,"\"",c[tolower($2)],"\"\n"
.
In this version, the msgid and msgstr lines are not necessarily adjacent. Consequently, every time we run across an msgid line, we save its value to the variable id
. When going through file2, we look for msgstr lines and store their value in the associative array c
. Then, when processing file1, we substitute c[id]
in msgstr lines:
awk -F'"' 'NR==FNR && $1=="msgid " {id=tolower($2)} NR==FNR && $1=="msgstr " {c[id]=$2} NR==FNR {next} $1=="msgid " {id=tolower($2)} {if ($1=="msgstr ") print "msgstr \"" c[id] "\""; else print $0}' file2 file1 >tmp && mv tmp file1
Upvotes: 1
Reputation: 58391
This might work for you (GNU sed):
sed -r '/msgid/{$!N;s|(.*)\n(.*)|/\1/I{n;s/.*/\2/}|}' file2 | sed -rf - file1
Turn file2
into a sed script which is run against file1
.
The script turns the line beginning msgid
and the following line into a command that matches the msgid
and then prints that line and replaces the next with the contents of the second line from the script file.
Upvotes: 0