Lorenzo
Lorenzo

Reputation: 23

remove values from a file present in another file using bash

I have a tab separated file A containing several values per row:

A   B   C   D   E
F   G   H   I
J   K   L   M
N   O   P
Q   R   S   T
U   V
X   Y   Z

I want to remove from file A the elements contained in the following file B:

A   D
J   M
U   V

resulting in a file C:

B   C   E
F   G   H   I
K   L
N   O   P
Q   R   S   T
X   Y   Z

Is there a way of doing this using bash?

Upvotes: 0

Views: 72

Answers (2)

Socowi
Socowi

Reputation: 27215

In case the entries do not contain any special symbols for sed (for instance ()[]/\.*?+) you can use the following command:

mapfile -t array < <(<B tr '\t' '\n')
(IFS='|'; sed -r "s/(${array[*]})\t?//g;/^$/d" A > C)

This command reads file B into an array. From the array a sed command is constructed. The sed command will filter out all entries and delete blank lines.

In your example, the constructed command ...

sed -r 's/(A|D|J|M|U|V)\t?//g;/^$/d' A > C

... generates the following file C (spaces are actually tabs)

B   C   E
F   G   H   I
K   L   
N   O   P
Q   R   S   T
X   Y   Z

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

awk solution:

awk 'NR == FNR{ pat = sprintf("%s%s|%s", (pat? pat "|":""), $1, $2); next }
     { 
         gsub("^(" pat ")[[:space:]]*|[[:space:]]*(" pat ")", "");
         if (NF) print 
     }' file_b file_a

The output:

B   C   E
F   G   H   I
K   L
N   O   P
Q   R   S   T
X   Y   Z

Upvotes: 0

Related Questions