Reputation: 75
I need to make a script file that reads two files and prints out common lines between them. I know that both the files are the same number of lines and each line only contains one word.
File 1:
Blue
Red
Orange
Green
Yellow
Blue
File 2:
Blue
Green
Red
Purple
Yellow
Blue
Expected output:
Blue
Yellow
Blue
So in the example Red and Green appear in both files, however they are not on the same line in each file so they are ignored.
Have tried using awk, grep and comm but couldn't get them to work.
Trying to find the solution that takes the shortest amount of time to process.
Upvotes: 3
Views: 3876
Reputation: 16997
Some more way,
awk 'FNR==NR{a[FNR,$1];next}(FNR,$1) in a' file1 file2
Test Results:
$ cat f1
Blue
Red
Orange
Green
Yellow
Blue
$ cat f2
Blue
Green
Red
Purple
Yellow
Blue
$ awk 'FNR==NR{a[FNR,$1];next}(FNR,$1) in a' f1 f2
Blue
Yellow
Blue
Upvotes: 2
Reputation: 2154
With paste
and awk
:
paste -d'|' file1 file2 | awk -F'|' '$1==$2 {print $1}'
I like the use of paste
from @Cyrus, but I think the comparison of the merged lines is easier to undertand with awk
. In this case -F
takes care of using the same separator |
and it is very simple to compare the first bit $1
with the second $2
. The output could be either one.
It is also assumed that |
is not part of the input files. Any other character may be chosen instead.
If each line of the input files contains only one word, this will be shorter and also work:
paste file1 file2 | awk '$1==$2 {print $1}'
Upvotes: 2
Reputation: 88601
With paste and GNU grep. Step by step.
paste -d '|' file1 file2
Output:
Blue|Blue Red|Green Orange|Red Green|Purple Yellow|Yellow Blue|Blue
paste -d '|' file1 file2 | grep -Po '^(.*)\|+\1$'
Output:
Blue|Blue Yellow|Yellow Blue|Blue
With \K
:
paste -d '|' file1 file2 | grep -Po '^(.*)\|+\K\1$'
Output:
Blue Yellow Blue
I assume |
is not in your files.
Upvotes: 4
Reputation: 124646
Using awk:
awk 'NR == FNR { lines[NR] = $0 } NR != FNR && lines[FNR] == $0 { print }' file1 file2
Explanation:
NR == FNR
), build a mapping of line number to valueNR != FNR
), if the current line matches what the corresponding line has in the cache, then print the lineThis reads both files exactly once, and uses roughly as much memory as the size of the first file.
Upvotes: 6