Reputation: 18584
I'm sure this has been asked but I can't find it so my apologies for redundancy.
I want to use grep or egrep to find every line that has either ' P ' or ' CA ' in them and pipe them to a new file. I can easily do it with one or the other using:
egrep ' CA ' all.pdb > CA.pdb
or
egrep ' P ' all.pdb > P.pdb
I'm new to regex so I'm not sure the syntax for or
.
Update: The order of the output lines is important, i.e. I do not want the output to sort the lines by which string it matched. Here is an example of the first 8 lines of one file:
ATOM 1 N THR U 27 -68.535 88.128 -17.857 1.00 0.00 1H5 N
ATOM 2 HT1 THR U 27 -69.437 88.216 -17.434 0.00 0.00 1H5 H
ATOM 3 HT2 THR U 27 -68.270 87.165 -17.902 0.00 0.00 1H5 H
ATOM 4 HT3 THR U 27 -68.551 88.520 -18.777 0.00 0.00 1H5 H
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 8 HB THR U 27 -68.543 88.566 -15.171 0.00 0.00 1H5 H
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 11 HB THR U 27 -68.543 88.566 -15.171 0.00 0.00 1H5 H
ATOM 12 C SER D 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 C
ATOM 13 OP1 SER D 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 O
and I want the result file for this example to be:
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
Upvotes: 19
Views: 21203
Reputation: 52625
On Mac OS Ventura, the following does the trick.
grep -e ' CA ' -e ' P ' all.pdb > CA.pdb
From the man
page of grep
-e pattern, --regexp=pattern Specify a pattern used during the search of the input: an input line is selected if it matches any of the specified patterns. This option is most useful when multiple -e options are used to specify multiple patterns, or when a pattern begins with a dash (‘-’).
Upvotes: 1
Reputation: 289495
You can use grep
like this:
grep ' P \| CA ' file > new_file
The |
expression indicates "or". We have to escape it in order to tell grep
that it has a special meaning.
You can avoid this escaping and using something fancier with an extended grep
:
grep -E ' (P|CA) ' file > new_file
In general, I prefer the awk
syntax, since it is more clear and easier to extend:
awk '/ P / || / CA /' file
Or given your sample input, you can use awk
to check if it is in the 3rd column when this happens:
$ awk '$3=="CA" || $3=="P"' file
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
$ cat file
hello P is here and CA also
but CA appears
nothing here
P CA
$ grep ' P \| CA ' file
hello P is here and CA also
but CA appears
$ grep -E ' (P|CA) ' file
hello P is here and CA also
but CA appears
$ awk '/ P / || / CA /' file
hello P is here and CA also
but CA appears
Upvotes: 29
Reputation: 17051
Next command will search in all files that exists in directory /path_to_your_dir/
and output log to /tmp/grep.log
:
grep 'P|CA' -Er /path_to_your_dir/ > /tmp/grep.log
If you need case insensitive, replace -Er
to -Eri
.
In file /tmp/grep.log
you will see path to file and matched string.
if you need search in files with specific extension then write something like:
grep 'P|CA' -Er --include=*.php /path_to_your_dir/ > /tmp/grep.log
Hope it will help you.
Upvotes: 0