Reputation: 23
I have a tab separated file A containing several values per row:
A B C D E
F G H I
J K L M
N O P
Q R S T
U V
X Y Z
I want to remove from file A the elements contained in the following file B:
A D
J M
U V
resulting in a file C:
B C E
F G H I
K L
N O P
Q R S T
X Y Z
Is there a way of doing this using bash?
Upvotes: 0
Views: 72
Reputation: 27215
In case the entries do not contain any special symbols for sed
(for instance ()[]/\.*?+
) you can use the following command:
mapfile -t array < <(<B tr '\t' '\n')
(IFS='|'; sed -r "s/(${array[*]})\t?//g;/^$/d" A > C)
This command reads file B
into an array. From the array a sed
command is constructed. The sed
command will filter out all entries and delete blank lines.
In your example, the constructed command ...
sed -r 's/(A|D|J|M|U|V)\t?//g;/^$/d' A > C
... generates the following file C
(spaces are actually tabs)
B C E
F G H I
K L
N O P
Q R S T
X Y Z
Upvotes: 1
Reputation: 92854
awk
solution:
awk 'NR == FNR{ pat = sprintf("%s%s|%s", (pat? pat "|":""), $1, $2); next }
{
gsub("^(" pat ")[[:space:]]*|[[:space:]]*(" pat ")", "");
if (NF) print
}' file_b file_a
The output:
B C E
F G H I
K L
N O P
Q R S T
X Y Z
Upvotes: 0