Parsa
Parsa

Reputation: 3236

Extract rows in file where a column value is included in a list?

I have a huge file of data:

datatable.txt

id1 england male
id2 germany female
... ... ...

I have another list of data:

indexes.txt

id1
id3
id6
id10
id11

I want to extract all rows from datatable.txt where the id is included in indexes.txt.

Is it possible to do this with awk/sed/grep? The file sizes are so large using R or python is not convenient.

Upvotes: 2

Views: 1860

Answers (2)

Inian
Inian

Reputation: 85663

You just need a simple awk as

awk 'FNR==NR {a[$1]; next}; $1 in a' indexes.csv datatable.csv
id1 england male
  1. FNR==NR{a[$1];next} will process on indexes.csv storing the entries of the array as the content of the first column till the end of the file.
  2. Now on datatable.csv, I can match those rows from the first file by doing $1 in a which will give me all those rows in current file whose column $1's value a[$1] is same as in other file.

Upvotes: 4

goeddek
goeddek

Reputation: 109

maybe i overlook something, but i build two test files:

a1:

id1
id2
id3
id6
id9
id10

and

a2:

id1 a 1
id2 b 2
id3 c 3
id4 c 4
id5 e 5
id6 f 6
id7 g 7
id8 h 8
id9 i 9
id10 j 10

with

join a1 a2 2> /dev/null

I get all lines matched by column one.

Upvotes: 0

Related Questions