BetaByte
BetaByte

Reputation: 75

Comparing two files in BASH line by line

I need to make a script file that reads two files and prints out common lines between them. I know that both the files are the same number of lines and each line only contains one word.

File 1:

Blue
Red
Orange
Green
Yellow
Blue

File 2:

Blue
Green
Red
Purple
Yellow
Blue

Expected output:

Blue
Yellow
Blue

So in the example Red and Green appear in both files, however they are not on the same line in each file so they are ignored.

Have tried using awk, grep and comm but couldn't get them to work.

Trying to find the solution that takes the shortest amount of time to process.

Upvotes: 3

Views: 3876

Answers (4)

Akshay Hegde
Akshay Hegde

Reputation: 16997

Some more way,

awk 'FNR==NR{a[FNR,$1];next}(FNR,$1) in a' file1 file2

Test Results:

$ cat f1
Blue
Red
Orange
Green
Yellow
Blue

$ cat f2
Blue
Green
Red
Purple
Yellow
Blue

$ awk 'FNR==NR{a[FNR,$1];next}(FNR,$1) in a' f1 f2
Blue
Yellow
Blue

Upvotes: 2

Javier Elices
Javier Elices

Reputation: 2154

With paste and awk:

paste -d'|' file1 file2 | awk -F'|' '$1==$2 {print $1}'

I like the use of paste from @Cyrus, but I think the comparison of the merged lines is easier to undertand with awk. In this case -F takes care of using the same separator | and it is very simple to compare the first bit $1 with the second $2. The output could be either one.

It is also assumed that | is not part of the input files. Any other character may be chosen instead.

If each line of the input files contains only one word, this will be shorter and also work:

paste file1 file2 | awk '$1==$2 {print $1}'

Upvotes: 2

Cyrus
Cyrus

Reputation: 88601

With paste and GNU grep. Step by step.

paste -d '|' file1 file2

Output:

Blue|Blue
Red|Green
Orange|Red
Green|Purple
Yellow|Yellow
Blue|Blue

paste -d '|' file1 file2 | grep -Po '^(.*)\|+\1$'

Output:

Blue|Blue
Yellow|Yellow
Blue|Blue

With \K:

paste -d '|' file1 file2 | grep -Po '^(.*)\|+\K\1$'

Output:

Blue
Yellow
Blue

I assume | is not in your files.

Upvotes: 4

janos
janos

Reputation: 124646

Using awk:

awk 'NR == FNR { lines[NR] = $0 } NR != FNR && lines[FNR] == $0 { print }' file1 file2

Explanation:

  • When reading the first file (NR == FNR), build a mapping of line number to value
  • When reading not the first file (NR != FNR), if the current line matches what the corresponding line has in the cache, then print the line

This reads both files exactly once, and uses roughly as much memory as the size of the first file.

Upvotes: 6

Related Questions