user2726449
user2726449

Reputation: 627

Extracting lines from a file that match a list of IDs

Hopefully I'm going to make sense here...

I have a huge file - each line represents data from a different individual. What I want to do is to grep (or extract) out information (lines) for certain individuals - I don't want to keep greping out the individuals individually then appending it all together at the end but I was curious whether there is a loop I can set up by proving a text file with the IDs (ie ID001, ID002... ID100) or some variable that is unique to each individual. I'm fairly new to programming so I'm not sure what I should be googling/looking for to get the answer - but is this possible in Shell?

Apologies for what might be a simple question.

Thanks!

EDIT 1: I'm adding a little more info here: format might be different but essentially the file is a genetics file and has the following format:

FAM001 ID001 A A T T TC T A…… A G
FAM001 ID002 A A T T C C A G…… T C
FAM004 ID003 A A T G T G A A…… A G
.
.
FAM100 ID100 G A C T C G T G…… T G

Is it possible to set up a loop, say, similar to/includes this:

for f in $( cat ~/FAMID.txt )

With the FAMID.txt as:

FAM001
FAM050
FAM087

to be able to run a certain analysis on the individuals with a certain FAMID ID but only running the program on the families in the list provided?

Hope that makes sense.

Upvotes: 2

Views: 3028

Answers (1)

glenn jackman
glenn jackman

Reputation: 247042

This is all you need:

grep -wFf FAMID.txt data.txt

where:

  • -f FAMID.txt tells grep to read the patterns from the file
  • -F tells grep that the patterns are plain strings so it can pick an appropriate matching engine
  • -w tells grep to only match patterns that form a whole word (so if you accidentally get "FAM" in the pattern file, you don't match every line of the data file)

Upvotes: 1

Related Questions