Sandeep Singh
Sandeep Singh

Reputation: 5191

Remove Matching Paragraphs From File By Searching Another File

I need help with a script which takes 2 Files as Input:

File1: [TEXT] contains Paragraphs SEPARATED with BLANK LINES
File2: [SEARCH KEYS] Paragraphs SEPARATED with BLANK LINES

and creates an Output File: File3 - which contains TEXT from File1 EXCEPT those PARAGRAPHS which EXACTLY MATCHES with those provided in File2.

i.e. The Script needs to search Each Paragraph Given in File1 -- in File2. IF a PERFECT MATCH (with ALL MATCHING LINES) is found, drop the Matching Paragraph from Output File3.

Given: 2 Files

File1:

PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

PARA2_LINE1
PARA2_LINE2

PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

File2:

PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

PARA2_LINE1

Required Output:

File3:

PARA2_LINE1
PARA2_LINE2

Note: The Second Paragraph [PARA2] is NOT a complete Match, hence it should not be ommited from File 3

Thanks

Upvotes: 1

Views: 428

Answers (2)

jaypal singh
jaypal singh

Reputation: 77135

This awk should work:

awk -v RS= -v ORS='\n\n' 'NR==FNR{a[$0]++;next}!($0 in a)' file2 file1
  • We turn the paragraph mode on by setting RS variable.
  • We load the entire paragraph as key to array a from file2.
  • If the paragraph is not found in file1 we print it.

$ cat file1
PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

PARA2_LINE1
PARA2_LINE2

PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

$ cat file2
PARA1_LINE1
PARA1_LINE2
PARA1_LINE3

PARA2_LINE1

$ awk -v RS= -v ORS='\n\n' 'NR==FNR{a[$0]++;next}!($0 in a)' file2 file1
PARA2_LINE1
PARA2_LINE2

Upvotes: 2

Miller
Miller

Reputation: 35208

Utilizing the input record separator $/ to process in paragraph mode. Note, I didn't chomp since the last record might have only a single return.

use strict;
use warnings;

if (@ARGV != 2) {
    print "Usage: $0 [Text File] [Search Key File]\n";
    exit;
}

my $file1 = shift;

local $/ = "\n\n";

my %para;
while (<>) {
    s/\n+$//;
    $para{$_} = 1;
}

local @ARGV = $file1;
while (<>) {
    s/\n+$//;
    print $_,$/ if ! $para{$_};
}

Upvotes: 1

Related Questions