Reputation: 2403
I have a grep operation that yields the following:
VRHFIN =Ce : [core=Xe4] s2d1f1
VRHFIN =O: s2p4
VRHFIN =C: s2p2
VRHFIN =H: ultrasoft test
The objective is to extract the part after the equal sign (which indicates a chemical element) and before the colon, and push them into an array in the order of appearance. In this specific case the desired array would contain
Ce O C H
How can this be achieved using regular expressions? Thank you in advance.
Upvotes: 2
Views: 397
Reputation: 2122
Nice and simple:
aArray=($(sed -n "s/^\w*\s*=\(\w*\)\s*.*/\1/p" /path/filename))
Upvotes: 1
Reputation: 785098
You can use process substitution to extract values from grep output and store them in array:
#!/bin/bash
arr=()
while IFS=':= ' read -r _ b _; do
arr+=("$b")
done < <(grep 'pattern' file)
# print array
declare -p arr
# or else
printf "%s\n" "${arr[@]}"
Replace grep 'pattern' file
with your actual grep
command.
Upvotes: 1
Reputation: 113834
Let's take this as the test file:
$ cat file
VRHFIN =Ce : [core=Xe4] s2d1f1
VRHFIN =O: s2p4
VRHFIN =C: s2p2
VRHFIN =H: ultrasoft test
other irrelevant lines
here.
If you have GNU grep
, which is what you have if you are running Linux, then you can extract the names of elements like this:
$ grep -oP '(?<==)\w+(?= *:)' file
Ce
O
C
H
You can put those names into a bash array as follows:
elements=($(grep -oP '(?<==)\w+(?= *:)' file))
The -P
option tells GNU grep to use Perl-style regular expressions. (?<==)
requires a =
before the match and (?= *:)
requires a colon after match.
We can verify that the array is correct via the declare
command:
$ declare -p elements
declare -a elements='([0]="Ce" [1]="O" [2]="C" [3]="H")'
One can obtain the same effect using sed
:
$ sed -nE 's/.*=([[:alpha:]]+)[[:space:]]*:.*/\1/p' file
Ce
O
C
H
The results can be stored in a bash array just like before:
$ elements2=($(sed -nE 's/.*=([[:alpha:]]+)[[:space:]]*:.*/\1/p' file))
$ declare -p elements2
declare -a elements2='([0]="Ce" [1]="O" [2]="C" [3]="H")'
Upvotes: 3