Reputation: 960

Merge two text files block by block

I have two text files each containing a block of text separated by empty lines. The blocks vary in sizes.

# ::id 10
# ::snt Yes !
 ...multiple lines of unstructured data from file 1...

# ::id 11
# ::snt said Lion .
 ...multiple lines of unstructured data from file 1...

# ::id 12
# ::snt Yes yes !
 ...multiple lines of unstructured data from file 1...

# ::id 13
# ::snt said Tiger .
 ...multiple lines of unstructured data from file 1...

and similarly another

# ::id 10
# ::snt No !
 ...multiple lines of unstructured data from file 2...

# ::id 11
# ::snt said Monkey .
 ...multiple lines of unstructured data from file 2...

# ::id 12
# ::snt No no !
 ...multiple lines of unstructured data from file 2...

# ::id 13
# ::snt said Donkey .
 ...multiple lines of unstructured data from file 2...

I want to merge the two blocks, but sort them by their # ::id. Also, I need to mainain the order of file1 data blocks before file2 data blocks. So final output should be something like:

# ::id 10
# ::snt Yes !
 ...multiple lines of unstructured data from file 1...

# ::id 10
# ::snt No !
 ...multiple lines of unstructured data from file 2...

# ::id 11
# ::snt said Lion .
 ...multiple lines of unstructured data from file 1...

# ::id 11
# ::snt said Monkey .
 ...multiple lines of unstructured data from file 2...

# ::id 12
# ::snt Yes yes !
 ...multiple lines of unstructured data from file 1...

# ::id 12
# ::snt No no !
 ...multiple lines of unstructured data from file 2...

# ::id 13
# ::snt said Tiger .
 ...multiple lines of unstructured data from file 1...

# ::id 13
# ::snt said Donkey .
 ...multiple lines of unstructured data from file 2...

How do I do it? Anything will work bash, sed, awk

Upvotes: 0

Answers (4)

Ed Morton

Reputation: 203522

$ awk -v RS= -v ORS='\n\n' 'NR==FNR{a[NR]=$0;next} {print a[FNR] ORS $0}' file1 file2
# ::id 10
# ::snt Yes !

# ::id 10
# ::snt No !

# ::id 11
# ::snt said Lion .

# ::id 11
# ::snt said Monkey .

# ::id 12
# ::snt Yes yes !

# ::id 12
# ::snt No no !

# ::id 13
# ::snt said Tiger .

# ::id 13
# ::snt said Donkey .

The above reads the contents of the files one paragraph at a time into array a[] where paragraphs are blocks of text separated by chains of blank lines (courtesy of setting RS to null). When it reads the first file it just stores them in an array a[1..number of paragraphs] and then after it's read all of file1 into a[] when it reads the 2nd file it prints the the corresponding paragraph from file1 (a[paragraph number]) first and then the current paragraph from file2.

Upvotes: 1

karakfa

Reputation: 67497

this matches records by the ids numbers in case they are not aligned in both files

$ awk -F'\n' -v RS= 'NR==FNR{a[$1]=$0; next}
                            {printf "%s\n\n%s\n\n",a[$1],$0}' file1 file2

# ::id 10
# ::snt Yes !
 ...multiple lines of unstructured data from file 1...

# ::id 10
# ::snt No !
 ...multiple lines of unstructured data from file 2...

# ::id 11
# ::snt said Lion .
 ...multiple lines of unstructured data from file 1...

# ::id 11
# ::snt said Monkey .
 ...multiple lines of unstructured data from file 2...

# ::id 12
# ::snt Yes yes !
 ...multiple lines of unstructured data from file 1...

# ::id 12
# ::snt No no !
 ...multiple lines of unstructured data from file 2...

# ::id 13
# ::snt said Tiger .
 ...multiple lines of unstructured data from file 1...

# ::id 13
# ::snt said Donkey .
 ...multiple lines of unstructured data from file 2...

can be enhanced to catch missing records in file2 as well.

Upvotes: 0

oliv

Reputation: 13249

You can achieve this using sed and sort:

 sed '/# ::id/N;s/\n/ /;/^$/d' file1 file2 | sort -s -n -k3,3 | sed 's/\(# ::snt.*\)/\n\1\n/'

The first sed part is joining line next line together with the one containing # ::id and delete the empty line.

The result is then sorted by the id number of the expression # ::id xx (3rd parameter).

At last the line are cut in 2 pieces where the # ::snt is found

Upvotes: 0

Michael Vehrs

Reputation: 3363

Say: awk -f merge.awk file1 file2

BEGIN { RS="" }
{ ARR[NR] = $0 }
END {
    n = asort(ARR);
    for (i = 1; i <= n; i++)
        print ARR[i];
}

Upvotes: 1

Merge two text files block by block

Answers (4)

Related Questions