Reputation: 57
i have two files: SCR_location - which has information about a SNP location in an ascending order.
19687
36075
n...
modi_VCF - a vcf table that has information about every SNP.
19687 G A xxx:255,0,195 xxx:255,0,206
20398 G C 0/0:0,255,255 0/0:0,208,255
n...
i want to save just the lines with the matching SNP location into a new file i wrote the following script but it doesn't work
cat SCR_location |while read SCR_l; do
cat modi_VCF |while read line; do
if [ "$SCR_l" -eq "$line" ] ;
then echo "$line" >> file
else :
fi
done
done
Upvotes: 0
Views: 117
Reputation: 22087
Would you please try a bash solution:
declare -A seen
while read -r line; do
seen[$line]=1
done < SCR_location
while read -r line; do
read -ra ary <<< "$line"
if [[ ${seen[${ary[0]}]} ]]; then
echo "$line"
fi
done < modi_VCF > file
seen
.If awk
is your option, you can also say:
awk 'NR==FNR {seen[$1]++; next} {if (seen[$1]) print}' SCR_location modi_VCF > file
[Edit] In order to filter out the unmached lines, just negate the logic as:
awk 'NR==FNR {seen[$1]++; next} {if (!seen[$1]) print}' SCR_location modi_VCF > file_unmatched
The code above outputs the unmatched lines only. If you want to sort the matched lines and the unmatched lines at once, please try:
awk 'NR==FNR {seen[$1]++; next} {if (seen[$1]) {print >> "file_matched"} else {print >> "file_unmatched"} }' SCR_location modi_VCF
Hope this helps.
Upvotes: 1