Reputation: 564
I know there are a couple of topics like this already been answered but all the code I found in this topics didn't work for my problem. So here is the description.
I have a problem with two files. The first file consists of 308370 lines, the other one of 308369 lines. Both files need to have the same length and the same order. I already ordered them. The columns for which you can compare both files is column 2. So to handle it easier I extracted the second column from both files to a separate temp-file.
I tried several things. I compared both temp-files and searched for empty lines but the result was always nothing. I found no difference but obviously there must be a difference. It is annoying. Hopefully you can help me.
This is like the temp-files look like:
rs12345
rs34567
rs45679567
rs345635
This is the bash-code i already tried:
comm file1 file2
grep -v -F -x -f file1 file2
awk 'FNR==NR{a[$0]++;next}!a[$0]' file1 file2
diff file_1 file_2 | grep '^>' | cut -c 3-
In the end I want to delete this one line which is in file 1 but not in file 2. Thank you for helping me in advance.
Best, Tobi
Upvotes: 0
Views: 825
Reputation: 188
If you can use the GUI tool then I suggest meld
for you. Easy to use and it shows the minor differences like extra space. Otherwise you can use diff
. Check man page of diff
for more info.
Upvotes: 1
Reputation: 564
First of all thanks again for helping. A couple of minutes after my post I solved my problem. I'm really sorry to steal your time.
When I sorted the files I saw that the one line was an empty line. So i cut out this line and that's it. But I'm a bit curious about that because I proofed if the file has a empty line. For this I used:
grep -v '^$' input > output
It seems that this doesn't work. I'm really sorry but I definitely will try your code @Wintermute. It looks awesome.
Best, Tobi
Upvotes: 0
Reputation: 44043
If I understand you correctly,
#!/bin/sh
awk -v file=0 -v offset=0 '
file == 0 {
data[FNR] = $0 # read first file into memory, both
key[FNR] = $2 # lines and isolated keys
}
file == 1 {
while(key[FNR + offset] != $2) { # When parsing the second file,
offset = offset + 1 # skip lines in the first that do not
# match keys in the second
if(FNR + offset > length(key)) {
exit
}
}
print data[FNR + offset] # when key is found, print corresponding
} # line from the first file
ENDFILE {
file = file + 1 # set flag when first file is over.
}' longer.txt shorter.txt
should do the trick. Given two files
foo 1 bar
foo 2 bar
foo 3 bar
foo 4 bar
and
qux 1 xyzzy
qux 2 xyzzy
qux 4 xyzzy
it prints
foo 1 bar
foo 2 bar
foo 4 bar
Upvotes: 1