Reputation: 365
I am looking for a fast way to extract lines of a file based on a list of line numbers read from a different file in bash.
Define three files:
position_file: Containing a single column of integers
full_data_file: Containing a single column of data
extracted_data_file: Containing those lines in full_data_file whose line numbers match the integers in position_file
My current way of doing this is
while read position; do
awk -v pos="$position" 'NR==pos {print; exit}' < full_data_file >> extracted_data_file
done < position_file
The problem is that this is painfully slow and I'm trying to do this for a large number of rather large files. I was hoping someone might be able to suggest a faster way.
Thank you for your help.
Upvotes: 3
Views: 321
Reputation: 92854
The right way with awk
command:
Input files:
$ head pos.txt data.txt
==> pos.txt <==
2
4
6
8
10
==> data.txt <==
a
b
c
d
e
f
g
h
i
j
awk 'NR==FNR{ a[$1]; next }FNR in a' pos.txt data.txt > result.txt
$ cat result.txt
b
d
f
h
j
Upvotes: 4