Reputation: 177
I would like grep to print out all complete words that include the match.
Google did not help me. Here what I tried:
cat file.txt
21676 Mm.24685 NM_009346 ENSMUSG00000055320
20349 Mm.134093 NM_011348 ENSMUSG00000063531
12456 Mm.134000 NM_011228 GM415666
grep -o "ENSMUS" file.txt
ENSMUS
ENSMUS
Desired output:
ENSMUSG00000055320
ENSMUSG00000063531
Thanks for your help!
Upvotes: 2
Views: 79
Reputation: 12347
To extract ENSEMBL mouse accession numbers without the version number:
grep -Po 'ENSMUS\w+' in_file
With the version number:
grep -Po 'ENSMUS\S+' in_file
Here,
\w+
: 1 or more word characters ([A-Za-z0-9_]
).
\S+
: 1 or more non-whitespace characters (you can also be more restrictive and use [\w.]+
, which is 1 or more word character or literal dot).
Here, GNU grep
uses the following options:
-P
: Use Perl regexes.
-o
: Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep
manual
perlre - Perl regular expressions
Upvotes: 1
Reputation: 785316
You may use:
grep -wo "ENSMUS[^[:blank:]]*" file.txt
ENSMUSG00000055320
ENSMUSG00000063531
Here [^[:blank:]]*
will match 0 or more characters that are not whitespaces. -w
will ensure full word matches.
Upvotes: 1