Dan
Dan

Reputation: 23

Bash script to compare 2 files with different length strings

I have two files I am trying to compare the strings in each line by line. File1 only contains a 6 character string prefix while File2 contains a 12 character string. How can I loop through the File2 to find strings that start with the 6 characters from File1 and output those to a file?

File1

002379
005964

File2

002379ED6212
003354EB4591
004679BB2185
005964AB3379
005964DB5496

Upvotes: 2

Views: 374

Answers (4)

ruakh
ruakh

Reputation: 183446

For a pure-Bash solution . . . assuming you're using Bash v4.x, you can first populate an associative array whose keys are the lines of File1:

declare -A prefixes
while read prefix ; do
    prefixes[$prefix]=1
done < File1

# Now ${prefixes[002379]} is 1, and ${prefixes[005964]} is 1, but
# ${prefixes[anything-else]} is undefined.

And then check the first six characters of each line of File2 to see if it's in this associative array:

while read word do ;
    prefix="${word:0:6}"
    if [[ "${prefixes[$prefix]}" ]] ; then
       echo "$word"
    fi
done < File2

Upvotes: 2

William Pursell
William Pursell

Reputation: 212354

grep -f <(sed 's/^/^/' file1) file2

It would be nice to just use grep -f to find all the lines in file2 that match a regex in file1, but you want to anchor the regexes in file1 to the beginning of the line. So use the above to preprocess the strings by adding an anchor.

Upvotes: 2

Tom Fenech
Tom Fenech

Reputation: 74685

This awk one-liner does what you want:

awk 'NR==FNR{a[$0];next}{for(i in a)if(substr($0,1,6)==i)print}' file1 file2

NR==FNR is only true for the first file. Each line of file1 is stored as a key in the array a. next skips the other block. For each record in the second file, loop through each of the keys in a and compare the first 6 characters. If they are the same, print the record.

Output:

002379ED6212
005964AB3379
005964DB5496

Upvotes: 2

iruvar
iruvar

Reputation: 23364

awk might be able to achieve this

awk 'NR == FNR {a[$0]; next};substr($0, 1, 6) in a' File1 File2

Upvotes: 2

Related Questions