Steven C. Howell
Steven C. Howell

Reputation: 18584

Use grep to find either of two strings without changing the order of the lines?

I'm sure this has been asked but I can't find it so my apologies for redundancy.

I want to use grep or egrep to find every line that has either ' P ' or ' CA ' in them and pipe them to a new file. I can easily do it with one or the other using:

egrep ' CA ' all.pdb > CA.pdb

or

egrep ' P ' all.pdb > P.pdb

I'm new to regex so I'm not sure the syntax for or.

Update: The order of the output lines is important, i.e. I do not want the output to sort the lines by which string it matched. Here is an example of the first 8 lines of one file:

ATOM      1 N    THR U  27     -68.535  88.128 -17.857  1.00  0.00      1H5  N  
ATOM      2 HT1  THR U  27     -69.437  88.216 -17.434  0.00  0.00      1H5  H  
ATOM      3 HT2  THR U  27     -68.270  87.165 -17.902  0.00  0.00      1H5  H  
ATOM      4 HT3  THR U  27     -68.551  88.520 -18.777  0.00  0.00      1H5  H  
ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM      8 HB   THR U  27     -68.543  88.566 -15.171  0.00  0.00      1H5  H  
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM     11 HB   THR U  27     -68.543  88.566 -15.171  0.00  0.00      1H5  H  
ATOM     12 C    SER D   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 C  
ATOM     13 OP1  SER D   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 O  

and I want the result file for this example to be:

ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  

Upvotes: 19

Views: 21203

Answers (3)

Raghuram
Raghuram

Reputation: 52625

On Mac OS Ventura, the following does the trick.

grep -e ' CA ' -e ' P ' all.pdb > CA.pdb

From the man page of grep

-e pattern, --regexp=pattern Specify a pattern used during the search of the input: an input line is selected if it matches any of the specified patterns. This option is most useful when multiple -e options are used to specify multiple patterns, or when a pattern begins with a dash (‘-’).

Upvotes: 1

fedorqui
fedorqui

Reputation: 289495

You can use grep like this:

grep ' P \| CA ' file > new_file

The | expression indicates "or". We have to escape it in order to tell grep that it has a special meaning.

You can avoid this escaping and using something fancier with an extended grep:

grep -E ' (P|CA) ' file > new_file

In general, I prefer the awk syntax, since it is more clear and easier to extend:

awk '/ P / || / CA /' file

Or given your sample input, you can use awk to check if it is in the 3rd column when this happens:

$ awk '$3=="CA" || $3=="P"' file
ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P

Test

$ cat file
hello P is here and CA also
but CA appears
nothing here
P CA
$ grep ' P \| CA ' file
hello P is here and CA also
but CA appears
$ grep -E ' (P|CA) ' file
hello P is here and CA also
but CA appears
$ awk '/ P / || / CA /' file
hello P is here and CA also
but CA appears

Upvotes: 29

cn0047
cn0047

Reputation: 17051

Next command will search in all files that exists in directory /path_to_your_dir/ and output log to /tmp/grep.log:

grep 'P|CA' -Er /path_to_your_dir/ > /tmp/grep.log

If you need case insensitive, replace -Er to -Eri.
In file /tmp/grep.log you will see path to file and matched string.
if you need search in files with specific extension then write something like:

grep 'P|CA' -Er --include=*.php /path_to_your_dir/ > /tmp/grep.log

Hope it will help you.

Upvotes: 0

Related Questions