Reputation: 51
I want TO extract list of names from other bigger file (input), having that name and some additional information associated with that name. My problem is with grep -f option, as it is not matching the exact entries in input file but some other entries that contain similar name.
I tried:
$ grep -f list.txt -A 1 input >output
Following are the format of files;
list.txt
TE_final_35005
TE_final_1040
Input file
>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Required output:
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
output I am getting:
>TE_final_10401
ACGTACGTACGTACGT
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Although TE_final_10401
is not in the list.txt
How I can use ^
in list?
Please help to match the exact value or suggest other ways to do this.
Upvotes: 1
Views: 358
Reputation: 4137
as others have mentioned, adding the -w
flag is the cleanest and easiest approach based on your sample data. but since you explicitly asked how you could use ^
in list.txt
, here's another option.
to add ^
and/or $
anchors to each line in list.txt
:
$ cat list.txt
^>TE_final_35005[ ]*$
^>TE_final_1040[ ]*$
this searches for your patterns at the start of the line, preceded by a >
character, and ignores any trailing spaces. then your previous command will work (assuming you either remove those blank lines or change your argument to -A 2
).
if you'd like to add these anchors to the list file automatically (and delete any blank lines at the same time), use this awk
construct:
awk '{if($0 != ""){print "^>"$0"[ ]*$"}}' list.txt >newlist.txt
or if you prefer sed
inplace editing:
sed -i '/^[ ]*$/d;s/\(.*\)/^>\1[ ]*$/g' list.txt
Upvotes: 1
Reputation: 85875
A couple of things, remove the blanks lines from the files first:
sed -i '/^\s*$/d' file list
Then -w
is used to match whole words only and -A1
will print the next line after the match:
$ grep -w -A1 -f list file > new_file
$ cat new_file
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Upvotes: 2
Reputation: 47189
Add the whole word switch (-w
):
grep -w -A1 -f list.txt infile
Output:
>TE_final_35005
ACGTACGATCAGT
>TE_final_1040
ACGTACGTACGT
Upvotes: 2