Sigurgeir
Sigurgeir

Reputation: 365

Fast extraction of lines based on line numbers

I am looking for a fast way to extract lines of a file based on a list of line numbers read from a different file in bash.

Define three files:

position_file: Containing a single column of integers

full_data_file: Containing a single column of data

extracted_data_file: Containing those lines in full_data_file whose line numbers match the integers in position_file

My current way of doing this is

while read position; do
    awk -v pos="$position" 'NR==pos {print; exit}' < full_data_file >> extracted_data_file
done < position_file

The problem is that this is painfully slow and I'm trying to do this for a large number of rather large files. I was hoping someone might be able to suggest a faster way.

Thank you for your help.

Upvotes: 3

Views: 321

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

The right way with awk command:

Input files:

$ head pos.txt data.txt
==> pos.txt <==
2
4
6
8
10

==> data.txt <==
a
b
c
d
e
f
g
h
i
j

awk 'NR==FNR{ a[$1]; next }FNR in a' pos.txt data.txt > result.txt

$ cat result.txt
b
d
f
h
j

Upvotes: 4

Related Questions