user3787046
user3787046

Reputation: 1

Bash Scipt to Compare Two Files for Matches

I want a bash script that will take Item 1 in File1, traverse all lines in File2 and output if a match exists. Continuing the pattern recursively, Item2, File1, traverse all lines in File2 for a match, continue this pattern until all lines in File one have been processed.

Now, check this out, sample data.

File1 - single column of hostnames, using the short name

vsie1p990
vsie1p991
vsie1p992
...

File2 - multi-column, comma separated, the first column is the hostname(shortname)

format: shortname, IP Address, fqdn

vsie1p992,191.167.44.212,vsie1p992.srv.us.company.com

I tried the following, but something is just not quite right:

#!/bin/bash
echo "Report Generated"
date

count=0

while read list ; do
{
  IT=`grep -i "$list" $2`
  If [ -n "$IT" ] ; then
     echo "Match Found: $list"
     count=`expr "$count" + 1`
  fi
 }
 done <$1
 echo "Total Matches = $count"

Example run: > ./checkit.sh list1 list2

Any help, advice, guidance would be greatly appreciated.

-Richard

Upvotes: 0

Views: 117

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84561

It is probably better from an efficiency standpoint to read all file_1 search values into an array in bash and then use grep to test file2 for the existence of the code in file_2. Here is an example:

#!/bin/bash

# validation checks omitted

declare -a codes

code=( `<"$1"` )   # read file1 values into array
szcode=${#code[@]}    # get the number of values read

for ((i=0; i<$szcode; i++)); do

    if `grep -q "${code[$i]}" "$2" &>/dev/null`; then
        echo " [checking $i of $szcode codes] - ${code[$i]} found in $2"
    fi

done

exit 0

output:

[checking 1 of 3 codes] - vsie1p991 found in readtitle.sh

This also allows a great deal of flexibility in what information you get back from grep. For example it could return the line number of the match, etc..

Upvotes: 0

buff
buff

Reputation: 2053

You can pass File1 to grep as a list of patterns:

grep -i -f File1 File2 > result
echo -n "Total matches: "; wc --lines result | cut -d' ' -f1

Upvotes: 2

steveha
steveha

Reputation: 76715

I know you asked for a Bash solution, but this Python code should be much faster. Instead of running grep on the entire second file once for each line in the first file, this reads the first file, then matches lines from the second file in one pass.

import sys

if len(sys.argv) != 3:
    print("Usage: match_fnames <file_with_names> <log_file>")
    sys.exit(1)

file_with_names, log_file = sys.argv[1:]

with open(file_with_names, "rt") as f:
    set_of_names = set(line.strip() for line in f)

total_matches = 0
with open(log_file, "rt") as f:
    for line in f:
        fields = line.split(',')
        hostname = fields[0]
        if hostname in set_of_names:
            total_matches += 1
            sys.stdout.write(line)

print("Total matches: {}".format(total_matches))

Put this in a file called match_files.py and then run it with: `python match_files.py filenames.txt logfile.txt"

This will also run perfectly well with Python 3.x.

Upvotes: -1

Related Questions