Reputation: 3542
Suppose I have setA.txt
:
a|b|0.1
c|d|0.2
b|a|0.3
and I also have setB.txt
:
c|d|200
a|b|100
Now I want to delete from setA.txt
lines that have the same first 2 fields with setB.txt
, so the output should be:
b|a|0.3
I tried:
comm -23 <(sort setA.txt) <(sort setB.txt)
But the equality is defined for whole line, so it won't work. How can I do this?
Upvotes: 2
Views: 61
Reputation: 22428
This should work:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p' setB.txt |sed -f- setA.txt
How this works:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p'
generates an output:
/^c|d/d
/^a|b/d
which is then used as a sed
script for the next sed
after the pipe and outputs:
b|a|0.3
Upvotes: 2
Reputation: 113844
$ awk -F\| 'FNR==NR{seen[$1,$2]=1;next;} !seen[$1,$2]' setB.txt setA.txt
b|a|0.3
This reads through setB.txt
just once, extracts the needed information from it, and then reads through setA.txt
while deciding which lines to print.
-F\|
This sets the field separator to a vertical bar, |
.
FNR==NR{seen[$1,$2]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, when FNR==NR
, we are reading the first file, setB.txt
. If so, set the value of associative array seen
to true, 1
, for the key consisting of fields one and two. Lastly, skip the rest of the commands and start over on the next
line.
!seen[$1,$2]
If we get to this command, we are working on the second file, setA.txt
. Since !
means negation, the condition is true if seen[$1,$2]
is false which means that this combination of fields one and two was not in setB.txt
. If so, then the default action is performed which is to print the line.
Upvotes: 3
Reputation: 1006
(IFS=$'|'; cat setA.txt | while read x y z; do grep -q -P "\Q$x|$y|\E" setB.txt || echo "$x|$y|$z"; done; )
explanation: grep -q means only test if grep can find the regexp, but do not output, -P means use Perl syntax, so that the |
is matched as is because the \Q..\E
struct.
IFS=$'|'
will make bash to use |
instead of the spaces (SPC, TAB, etc.) as token separator.
Upvotes: 0