Tobias
Tobias

Reputation: 564

Find the difference between two files with bash

I know there are a couple of topics like this already been answered but all the code I found in this topics didn't work for my problem. So here is the description.

I have a problem with two files. The first file consists of 308370 lines, the other one of 308369 lines. Both files need to have the same length and the same order. I already ordered them. The columns for which you can compare both files is column 2. So to handle it easier I extracted the second column from both files to a separate temp-file.

I tried several things. I compared both temp-files and searched for empty lines but the result was always nothing. I found no difference but obviously there must be a difference. It is annoying. Hopefully you can help me.

This is like the temp-files look like:

rs12345
rs34567
rs45679567
rs345635

This is the bash-code i already tried:

comm file1 file2
grep -v -F -x -f file1 file2
awk 'FNR==NR{a[$0]++;next}!a[$0]' file1 file2
diff file_1 file_2 | grep '^>' | cut -c 3-

In the end I want to delete this one line which is in file 1 but not in file 2. Thank you for helping me in advance.

Best, Tobi

Upvotes: 0

Views: 825

Answers (3)

Thushi
Thushi

Reputation: 188

If you can use the GUI tool then I suggest meld for you. Easy to use and it shows the minor differences like extra space. Otherwise you can use diff. Check man page of diff for more info.

Upvotes: 1

Tobias
Tobias

Reputation: 564

First of all thanks again for helping. A couple of minutes after my post I solved my problem. I'm really sorry to steal your time.

When I sorted the files I saw that the one line was an empty line. So i cut out this line and that's it. But I'm a bit curious about that because I proofed if the file has a empty line. For this I used:

grep -v '^$' input > output

It seems that this doesn't work. I'm really sorry but I definitely will try your code @Wintermute. It looks awesome.

Best, Tobi

Upvotes: 0

Wintermute
Wintermute

Reputation: 44043

If I understand you correctly,

#!/bin/sh

awk -v file=0 -v offset=0 '
  file == 0 {
    data[FNR] = $0                       # read first file into memory, both
    key[FNR]  = $2                       # lines and isolated keys
  }
  file == 1 {
    while(key[FNR + offset] != $2) {     # When parsing the second file,
      offset = offset + 1                # skip lines in the first that do not
                                         # match keys in the second
      if(FNR + offset > length(key)) {
        exit
      }
    }
    print data[FNR + offset]             # when key is found, print corresponding
  }                                      # line from the first file
  ENDFILE {
   file = file + 1                       # set flag when first file is over.
  }' longer.txt shorter.txt

should do the trick. Given two files

foo 1 bar
foo 2 bar
foo 3 bar
foo 4 bar

and

qux 1 xyzzy
qux 2 xyzzy
qux 4 xyzzy

it prints

foo 1 bar
foo 2 bar
foo 4 bar

Upvotes: 1

Related Questions