PhysicalChemist
PhysicalChemist

Reputation: 560

Regex Matching for Bash

I have potential inputs that will come in from a read -e -p command in a bash script. For example, the user would type L50CA. Some other possibilites that the user could type in are: K117CB, K46CE2, or V9CE1.

I need to break up what was read in. I read in like this:

read -e -p "What first atom? " sel1

then I would like to make an array like this (but this will not separate):

arr1=($sel1)

But I need to separate the array so that

${arr1[0]} is equal to L ${arr1[1]} is equal to 50 and ${arr1[2]} is equal to CA

This separation has to work with the other possible user input formats like the ones listed above. Regex seems to be the way to do this. I can isolate the first two matches of the input with the following regular expressions: ^\D and \d*(?=\w)

I need help matching the third component and implementing it into an array. Alternatively, it is fine to to break up the user input into three new variables. Or we can place a space between each of the matches so L50CA is converted to L 50 CA because then arr1=($sel1) will work.

Thanks for your help.

Upvotes: 0

Views: 73

Answers (3)

steffen
steffen

Reputation: 17048

In bash using string manipulation:

 ~$ sel1=L50CA
 ~$ part1=$(expr match $sel1 "\([A-Z]\+\).*")
 ~$ part2=$(expr match $sel1 "[A-Z]*\([0-9]\+\).*")
 ~$ part3=$(expr match $sel1 "[A-Z]*[0-9]*\([A-Z]*\)")
 ~$ echo $part{1,2,3}
 L 50 CA
 ~$ arr=($part{1,2,3})
 ~$ echo ${arr[@]}
 L 50 CA

Upvotes: 1

Etan Reisner
Etan Reisner

Reputation: 81032

Bash only solution:

for sel in L50CA K117CB K46CE2 V9CE1; do
    [[ "$sel" =~ "^(\w)([0-9]+)(.*)" ]]
    printf '%s - ' "${BASH_REMATCH[@]}"
    printf \\n;
done

Upvotes: 2

clt60
clt60

Reputation: 63972

The

for sel in L50CA K117CB K46CE2 V9CE1
do
        arr=($(sed 's/\([0-9][0-9]*\)/ \1 /g'<<<"$sel"))
        echo "${arr[@]}"
done

prints

L 50 CA
K 117 CB
K 46 CE 2
V 9 CE 1

Upvotes: 1

Related Questions