Bamqf
Bamqf

Reputation: 3542

Delete lines from a file matching first 2 fields from a second file in shell script

Suppose I have setA.txt:

a|b|0.1
c|d|0.2
b|a|0.3

and I also have setB.txt:

c|d|200
a|b|100

Now I want to delete from setA.txt lines that have the same first 2 fields with setB.txt, so the output should be:

b|a|0.3

I tried:

comm -23 <(sort setA.txt) <(sort setB.txt)

But the equality is defined for whole line, so it won't work. How can I do this?

Upvotes: 2

Views: 61

Answers (3)

Jahid
Jahid

Reputation: 22428

This should work:

sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p' setB.txt |sed -f- setA.txt

How this works:

sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p'

generates an output:

/^c|d/d
/^a|b/d

which is then used as a sed script for the next sed after the pipe and outputs:

b|a|0.3

Upvotes: 2

John1024
John1024

Reputation: 113844

$ awk -F\| 'FNR==NR{seen[$1,$2]=1;next;} !seen[$1,$2]' setB.txt setA.txt
b|a|0.3

This reads through setB.txt just once, extracts the needed information from it, and then reads through setA.txt while deciding which lines to print.

How it works

  • -F\|

    This sets the field separator to a vertical bar, |.

  • FNR==NR{seen[$1,$2]=1;next;}

    FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, when FNR==NR, we are reading the first file, setB.txt. If so, set the value of associative array seen to true, 1, for the key consisting of fields one and two. Lastly, skip the rest of the commands and start over on the next line.

  • !seen[$1,$2]

    If we get to this command, we are working on the second file, setA.txt. Since ! means negation, the condition is true if seen[$1,$2] is false which means that this combination of fields one and two was not in setB.txt. If so, then the default action is performed which is to print the line.

Upvotes: 3

Bao Haojun
Bao Haojun

Reputation: 1006

(IFS=$'|'; cat setA.txt | while read x y z; do grep -q -P "\Q$x|$y|\E" setB.txt || echo "$x|$y|$z"; done; )

explanation: grep -q means only test if grep can find the regexp, but do not output, -P means use Perl syntax, so that the | is matched as is because the \Q..\E struct.

IFS=$'|' will make bash to use | instead of the spaces (SPC, TAB, etc.) as token separator.

Upvotes: 0

Related Questions