Reputation: 5066

different lines in two files when ignoring last column - in bash

I have two files, smaller and bigger and bigger contains all lines of smaller. Those lines are almost same, just last column differs.

file_smaller
  A NM 0
  B GT 4

file_bigger
  A NM 5 <-same as in file_smaller according to my rules
  C TY 2
  D OP 6
  B GT 3 <-same as in file_smaller according to my rules

I would like to write lines, where the two files differ, that means:

wished_output
  C TY 2
  D OP 6

Could you please help me to do so? Thanks a lot.

Upvotes: 1

Answers (4)

glenn jackman

Reputation: 247012

grep -vf <(cut -d " " -f 1-2 file_smaller| sed 's/^/^/') file_bigger

The process substitution results in this:

^A NM
^B GT

Then, grep -v removes those patterns from "file_bigger"

Upvotes: 1

ormaaj

Reputation: 6607

Bash 4 using associative arrays:

#!/usr/bin/env bash

f() {
    if (( $# != 2 )); then
        echo "usage: ${FUNCNAME} <smaller> <bigger>" >&2
        return 1
    fi

    local -A smaller
    local -a x

    while read -ra x; do
        smaller["${x[@]::2}"]=0
    done <"$1"

    while read -ra x; do
        ((${smaller["${x[@]::2}"]:-1})) && echo "${x[*]}"
    done <"$2"
}

f /dev/fd/3 /dev/fd/0 <<"SMALLER" 3<&0 <<"BIGGER"
A NM 0
B GT 4
SMALLER
A NM 5
C TY 2
D OP 6
B GT 3
BIGGER

Upvotes: 0

jim mcnamara

Reputation: 16389

awk 'FILENAME==file_bigger {arr[$1 $2]=$0}
     FILENAME==file_smaller { tmp=$1 $2;  if( tmp in arr) {next} else {print $0}}
    ' file_bigger file_smaller

See if that meets you needs

Upvotes: 1

byrondrossos

Reputation: 2117

you can do the following:

cat file_bigger file_smaller |sed 's=\(.*\).$=\1='|sort| uniq -u > temp_pat
grep -f temp_pat file_bigger ; rm temp_pat

which will (in the same order)

merge the files
remove the last column
sort the result
print only unique lines in temp_pat
find the original lines in file_bigger

all in all, the expected result.

Upvotes: 2

different lines in two files when ignoring last column - in bash

Answers (4)

Related Questions