user32004
user32004

Reputation: 2403

Bash, how to extract grep occurences and store them in an array

I have a grep operation that yields the following:

VRHFIN =Ce : [core=Xe4] s2d1f1 VRHFIN =O: s2p4 VRHFIN =C: s2p2 VRHFIN =H: ultrasoft test

The objective is to extract the part after the equal sign (which indicates a chemical element) and before the colon, and push them into an array in the order of appearance. In this specific case the desired array would contain

Ce O C H

How can this be achieved using regular expressions? Thank you in advance.

Upvotes: 2

Views: 397

Answers (3)

jgshawkey
jgshawkey

Reputation: 2122

Nice and simple:

aArray=($(sed -n "s/^\w*\s*=\(\w*\)\s*.*/\1/p" /path/filename))
  • aArray=() - identifies contents as elements of an array
  • $() - runs contents as command
  • sed -n - do not print
  • s/ - substitute
  • ^ - beginning of line
  • \w* - zero or more word characters
  • \s* - zero ore more whitespace characters
  • \( \) - store everything inside as \1
  • .* - match anything
  • /\1 - replace everything with contents of \1
  • p - print results

Upvotes: 1

anubhava
anubhava

Reputation: 785098

You can use process substitution to extract values from grep output and store them in array:

#!/bin/bash

arr=()

while IFS=':= ' read -r _ b _; do
   arr+=("$b")
done < <(grep 'pattern' file)

# print array
declare -p arr
# or else
printf "%s\n" "${arr[@]}"

Replace grep 'pattern' file with your actual grep command.

Upvotes: 1

John1024
John1024

Reputation: 113834

Solution using GNU grep

Let's take this as the test file:

$ cat file
VRHFIN =Ce : [core=Xe4] s2d1f1
VRHFIN =O: s2p4
VRHFIN =C: s2p2
VRHFIN =H: ultrasoft test
other irrelevant lines
here.

If you have GNU grep, which is what you have if you are running Linux, then you can extract the names of elements like this:

$ grep -oP '(?<==)\w+(?= *:)' file
Ce
O
C
H

You can put those names into a bash array as follows:

elements=($(grep -oP '(?<==)\w+(?= *:)' file))

The -P option tells GNU grep to use Perl-style regular expressions. (?<==) requires a = before the match and (?= *:) requires a colon after match.

We can verify that the array is correct via the declare command:

$ declare -p elements
declare -a elements='([0]="Ce" [1]="O" [2]="C" [3]="H")'

Solution not requiring GNU grep

One can obtain the same effect using sed:

$ sed -nE 's/.*=([[:alpha:]]+)[[:space:]]*:.*/\1/p' file
Ce
O
C
H

The results can be stored in a bash array just like before:

$ elements2=($(sed -nE 's/.*=([[:alpha:]]+)[[:space:]]*:.*/\1/p' file))
$ declare -p elements2
declare -a elements2='([0]="Ce" [1]="O" [2]="C" [3]="H")'

Upvotes: 3

Related Questions