I want to find common lines in two files and replace the next line of the first file with the next line of second file. Sed, awk, Perl, Bash, any solution is welcomed. The comparison is case-insensitive and there can be multiple occurrences of the same line. File 1: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "" kgcksdcgfkdsb msgid "" hsdvchgsdvc msgstr "" dhshfjksdfhmd msgid "Vidya" msgstr "" sdjhcbnd dcndnv cfnkdndvrknvkf dfkvrnkdfnk snfvrkng msgid "Rome" msgstr "" wdbhkjbcfj #dmcdmf f,nvdf, fvnfnvk vfmf,mv vfn msgid "vid" msgstr "" dmcbdmbcvmfbvmkhsdk File 2: dfhkvgjbfrvkf msgid "Rome" msgstr "new bie" sdbsjbcdcbwoido fjcdcvnm msgid "vidya" msgstr "expert" dvnjfkdvhnkfvnknsbdjh msgid "vid" msgstr "newton" dfenfjdbrfjbvlfnvl dcnkncvkdfvknfv fcndkbvknfkv vfdnkvnfknbvkfn Later, file 1 should be: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "new bie" kgcksdcgfkdsb msgid "" hsdvchgsdvc msgstr "" dhshfjksdfhmd msgid "Vidya" msgstr "expert" sdjhcbnd dcndnv cfnkdndvrknvkf dfkvrnkdfnk snfvrkng msgid "Rome" msgstr "new bie" wdbhkjbcfj #dmcdmf f,nvdf, fvnfnvk vfmf,mv vfn msgid "vid" msgstr "newton" dmcbdmbcvmfbvmkhsdk

Find common line in two files and replace the next line of first file with the next line of other file

Reputation: 124

You can use this app I wrote by php. It solved the same problem.
Input: need to 2 files:
- file 1: like dictionary for the 2nd file
- file 2: like destination for getting the pair "msgid/msgstr" from file 1 if it's existing.
Output: as well as you mentioned

Check my source code here! https://github.com/NguyenDuyPhong/merge_two_po_files Connect to main page by the link: .../merge_two_po_files/web (Ex: http://localhost/merge_two_po_files/web )

Upvotes: 0

John1024

Reputation: 113834

Version 1 of question:

In this version of the question, the lines with msgid and msgstr appear in pairs and are separated from other lines with a blank line. Here is a one (long) line solution for this case:

$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""} NR==FNR {c[tolower($2)]=$4; next} {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1
msgid "Rome"
msgstr "new bie"

msgid "Vidya"
msgstr "expert"

msgid "Rome"
msgstr "new bie"

msgid "vid"
msgstr "newton"

MORE: The version below updates file1 with the new information:

$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""} NR==FNR {c[tolower($2)]=$4; next} {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1 >tmp && mv tmp file1

How it works: Let's break it down into parts. The first part is just setup:

$ awk -F'"' 'BEGIN{RS="\n\n";OFS=""}

To understand the above, one needs to know that awk breaks a file up into 'records', and then it breaks records up into 'fields'. The above says that every time a blank line appears (two newline characters in a row), treat what follows as a new 'record'. In other words the record separator is two newlines: RS="\n\n". It also says that records should be broken up into fields according to the appearance of a double-quote: -F'"'. Finally, it says that, when printing our output, do not add any additional spaces to what we have. In other words, the output field separator is the empty string: OFS=""

The next part is:

... NR==FNR {c[tolower($2)]=$4; next} ...

This says that, while reading the first file name given (file2), create an associate array (like a dictionary) called c. The keys for the array are the msgid's. The values are the msgstr's. Thus, c[rome]=new bie. We apply tolower to the msgid's so that all the keys are consistently lower case. The next command means that.

The NR==FNR part above probably looks obscure. To understand it, one needs to know that awk counts the number of records that it has seen and assigns that value to NR. It also counts the number of records that it has seen from the file that it is currently reading and assigns that to FNR. So, when we are reading the first file, it follows that the two are equal: NR==FNR. When awk starts reading the second file, then NR>FNR and the block of code will be skipped.

The last part is:

... {print $1,"\"",$2,"\"",$3,"\"",c[tolower($2)],"\"\n"}' file2 file1

This part is executed when we start reading the second file (file1) and consists of just one print statement. It prints the msgid line, including the quotes around the msgid: $1,"\"",$2,"\"". And, it also prints out the msgstr, looking up the value of the msgstr in the our associative array c and putting quotes around it and a newline character at the end: "\"",$3,"\"",c[tolower($2)],"\"\n".

Version 2 of question:

In this version, the msgid and msgstr lines are not necessarily adjacent. Consequently, every time we run across an msgid line, we save its value to the variable id. When going through file2, we look for msgstr lines and store their value in the associative array c. Then, when processing file1, we substitute c[id] in msgstr lines:

awk  -F'"' 'NR==FNR && $1=="msgid " {id=tolower($2)} NR==FNR && $1=="msgstr " {c[id]=$2} NR==FNR {next} $1=="msgid " {id=tolower($2)} {if ($1=="msgstr ") print "msgstr \"" c[id] "\""; else print $0}' file2 file1 >tmp && mv tmp file1

Upvotes: 1

potong

Reputation: 58391

This might work for you (GNU sed):

sed -r '/msgid/{$!N;s|(.*)\n(.*)|/\1/I{n;s/.*/\2/}|}' file2 | sed -rf - file1

Turn file2 into a sed script which is run against file1.

The script turns the line beginning msgid and the following line into a command that matches the msgid and then prints that line and replaces the next with the contents of the second line from the script file.

Upvotes: 0

Find common line in two files and replace the next line of first file with the next line of other file

Answers (3)

Version 1 of question:

Version 2 of question:

Related Questions