Reputation: 3
I have two files one with 17k lines and another one with 4k lines. I wanted to compare position 115 to position 125 with each line in the second file and if there is a match, write the entire line from the first file into a new file. I had come up with a solution where i read the file using 'cat $filename | while read LINE'. but it's taking around 8 mins to complete. is there any other way like using 'awk' to reduce this process time.
my code
cat $filename | while read LINE
do
#read 115 to 125 and then remove trailing spaces and leading zeroes
vid=`echo "$LINE" | cut -c 115-125 | sed 's,^ *,,; s, *$,,' | sed 's/^[0]*//'`
exist=0
#match vid with entire line in id.txt
exist=`grep -x "$vid" $file_dir/id.txt | wc -l`
if [[ $exist -gt 0 ]]; then
echo "$LINE" >> $dest_dir/id.txt
fi
done
Upvotes: 0
Views: 717
Reputation: 85865
How is this:
FNR==NR { # FNR == NR is only true in the first file
s = substr($0,115,10) # Store the section of the line interested in
sub(/^\s*/,"",s) # Remove any leading whitespace
sub(/\s*$/,"",s) # Remove any trailing whitespace
lines[s]=$0 # Create array of lines
next # Get next line in first file
}
{ # Now in second file
for(i in lines) # For each line in the array
if (i~$0) { # If matches the current line in second file
print lines[i] # Print the matching line from file1
next # Get next line in second file
}
}
Save it to a script script.awk
and run like:
$ awk -f script.awk "$filename" "${file_dir}/id.txt" > "${dest_dir}/id.txt"
This will still be slow because for each line in second file you need to look at ~50% of the unique lines in first (assuming most line do in fact match). This can be significantly improved if you can confirmed that the lines in the second file are full line matches against the substrings.
For full line matches this should be faster:
FNR==NR { # FNR == NR is only true in the first file
s = substr($0,115,10) # Store the section of the line interested in
sub(/^\s*/,"",s) # Remove any leading whitespace
sub(/\s*$/,"",s) # Remove any trailing whitespace
lines[s]=$0 # Create array of lines
next # Get next line in first file
}
($0 in lines) { # Now in second file
print lines[$0] # Print the matching line from file1
}
Upvotes: 2