Ank
Ank

Reputation: 51

grep -f forEXACT pattern

I want TO extract list of names from other bigger file (input), having that name and some additional information associated with that name. My problem is with grep -f option, as it is not matching the exact entries in input file but some other entries that contain similar name.

I tried:

$ grep -f list.txt -A 1 input >output

Following are the format of files;

list.txt

TE_final_35005
TE_final_1040

Input file

>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005 
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT

Required output:

>TE_final_35005 
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT

output I am getting:

>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005 
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT

Although TE_final_10401 is not in the list.txt

How I can use ^ in list?

Please help to match the exact value or suggest other ways to do this.

Upvotes: 1

Views: 358

Answers (3)

nullrevolution
nullrevolution

Reputation: 4137

as others have mentioned, adding the -w flag is the cleanest and easiest approach based on your sample data. but since you explicitly asked how you could use ^ in list.txt, here's another option.

to add ^ and/or $ anchors to each line in list.txt:

$ cat list.txt
^>TE_final_35005[ ]*$
^>TE_final_1040[ ]*$

this searches for your patterns at the start of the line, preceded by a > character, and ignores any trailing spaces. then your previous command will work (assuming you either remove those blank lines or change your argument to -A 2).

if you'd like to add these anchors to the list file automatically (and delete any blank lines at the same time), use this awk construct:

awk '{if($0 != ""){print "^>"$0"[ ]*$"}}' list.txt >newlist.txt

or if you prefer sed inplace editing:

sed -i '/^[ ]*$/d;s/\(.*\)/^>\1[ ]*$/g' list.txt

Upvotes: 1

Chris Seymour
Chris Seymour

Reputation: 85875

A couple of things, remove the blanks lines from the files first:

sed  -i '/^\s*$/d' file list

Then -w is used to match whole words only and -A1 will print the next line after the match:

$ grep -w -A1 -f list file > new_file

$ cat new_file
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT

Upvotes: 2

Thor
Thor

Reputation: 47189

Add the whole word switch (-w):

grep -w -A1 -f list.txt infile

Output:

>TE_final_35005 
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT

Upvotes: 2

Related Questions