Reputation: 960
I have two text files each containing a block of text separated by empty lines. The blocks vary in sizes.
# ::id 10
# ::snt Yes !
...multiple lines of unstructured data from file 1...
# ::id 11
# ::snt said Lion .
...multiple lines of unstructured data from file 1...
# ::id 12
# ::snt Yes yes !
...multiple lines of unstructured data from file 1...
# ::id 13
# ::snt said Tiger .
...multiple lines of unstructured data from file 1...
and similarly another
# ::id 10
# ::snt No !
...multiple lines of unstructured data from file 2...
# ::id 11
# ::snt said Monkey .
...multiple lines of unstructured data from file 2...
# ::id 12
# ::snt No no !
...multiple lines of unstructured data from file 2...
# ::id 13
# ::snt said Donkey .
...multiple lines of unstructured data from file 2...
I want to merge the two blocks, but sort them by their # ::id
. Also, I need to mainain the order of file1 data blocks before file2 data blocks. So final output should be something like:
# ::id 10
# ::snt Yes !
...multiple lines of unstructured data from file 1...
# ::id 10
# ::snt No !
...multiple lines of unstructured data from file 2...
# ::id 11
# ::snt said Lion .
...multiple lines of unstructured data from file 1...
# ::id 11
# ::snt said Monkey .
...multiple lines of unstructured data from file 2...
# ::id 12
# ::snt Yes yes !
...multiple lines of unstructured data from file 1...
# ::id 12
# ::snt No no !
...multiple lines of unstructured data from file 2...
# ::id 13
# ::snt said Tiger .
...multiple lines of unstructured data from file 1...
# ::id 13
# ::snt said Donkey .
...multiple lines of unstructured data from file 2...
How do I do it? Anything will work bash
, sed
, awk
Upvotes: 0
Views: 243
Reputation: 203522
$ awk -v RS= -v ORS='\n\n' 'NR==FNR{a[NR]=$0;next} {print a[FNR] ORS $0}' file1 file2
# ::id 10
# ::snt Yes !
# ::id 10
# ::snt No !
# ::id 11
# ::snt said Lion .
# ::id 11
# ::snt said Monkey .
# ::id 12
# ::snt Yes yes !
# ::id 12
# ::snt No no !
# ::id 13
# ::snt said Tiger .
# ::id 13
# ::snt said Donkey .
The above reads the contents of the files one paragraph at a time into array a[]
where paragraphs are blocks of text separated by chains of blank lines (courtesy of setting RS
to null). When it reads the first file it just stores them in an array a[1..number of paragraphs]
and then after it's read all of file1 into a[]
when it reads the 2nd file it prints the the corresponding paragraph from file1 (a[paragraph number]
) first and then the current paragraph from file2.
Upvotes: 1
Reputation: 67497
this matches records by the ids numbers in case they are not aligned in both files
$ awk -F'\n' -v RS= 'NR==FNR{a[$1]=$0; next}
{printf "%s\n\n%s\n\n",a[$1],$0}' file1 file2
# ::id 10
# ::snt Yes !
...multiple lines of unstructured data from file 1...
# ::id 10
# ::snt No !
...multiple lines of unstructured data from file 2...
# ::id 11
# ::snt said Lion .
...multiple lines of unstructured data from file 1...
# ::id 11
# ::snt said Monkey .
...multiple lines of unstructured data from file 2...
# ::id 12
# ::snt Yes yes !
...multiple lines of unstructured data from file 1...
# ::id 12
# ::snt No no !
...multiple lines of unstructured data from file 2...
# ::id 13
# ::snt said Tiger .
...multiple lines of unstructured data from file 1...
# ::id 13
# ::snt said Donkey .
...multiple lines of unstructured data from file 2...
can be enhanced to catch missing records in file2 as well.
Upvotes: 0
Reputation: 13249
You can achieve this using sed
and sort
:
sed '/# ::id/N;s/\n/ /;/^$/d' file1 file2 | sort -s -n -k3,3 | sed 's/\(# ::snt.*\)/\n\1\n/'
The first sed
part is joining line next line together with the one containing # ::id
and delete the empty line.
The result is then sorted by the id number of the expression # ::id xx
(3rd parameter).
At last the line are cut in 2 pieces where the # ::snt
is found
Upvotes: 0
Reputation: 3363
Say: awk -f merge.awk file1 file2
BEGIN { RS="" }
{ ARR[NR] = $0 }
END {
n = asort(ARR);
for (i = 1; i <= n; i++)
print ARR[i];
}
Upvotes: 1