Reputation: 3236
I have a huge file of data:
datatable.txt
id1 england male
id2 germany female
... ... ...
I have another list of data:
indexes.txt
id1
id3
id6
id10
id11
I want to extract all rows from datatable.txt where the id is included in indexes.txt.
Is it possible to do this with awk/sed/grep? The file sizes are so large using R or python is not convenient.
Upvotes: 2
Views: 1860
Reputation: 85663
You just need a simple awk
as
awk 'FNR==NR {a[$1]; next}; $1 in a' indexes.csv datatable.csv
id1 england male
FNR==NR{a[$1];next}
will process on indexes.csv
storing the
entries of the array as the content of the first column till the end of
the file. datatable.csv
, I can match those rows from the first file by doing
$1 in a
which will give me all those rows in current file whose
column $1
's value a[$1]
is same as in other file.Upvotes: 4
Reputation: 109
maybe i overlook something, but i build two test files:
a1:
id1
id2
id3
id6
id9
id10
and
a2:
id1 a 1
id2 b 2
id3 c 3
id4 c 4
id5 e 5
id6 f 6
id7 g 7
id8 h 8
id9 i 9
id10 j 10
with
join a1 a2 2> /dev/null
I get all lines matched by column one.
Upvotes: 0