Extract rows in file where a column value is included in a list?

Question

I have a huge file of data:

datatable.txt

id1 england male
id2 germany female
... ... ...

I have another list of data:

indexes.txt

id1
id3
id6
id10
id11

I want to extract all rows from datatable.txt where the id is included in indexes.txt.

Is it possible to do this with awk/sed/grep? The file sizes are so large using R or python is not convenient.

Inian · Accepted Answer

You just need a simple awk as

awk 'FNR==NR {a[$1]; next}; $1 in a' indexes.csv datatable.csv
id1 england male

FNR==NR{a[$1];next} will process on indexes.csv storing the entries of the array as the content of the first column till the end of the file.
Now on datatable.csv, I can match those rows from the first file by doing $1 in a which will give me all those rows in current file whose column $1's value a[$1] is same as in other file.

Answers (2)