user2740039
user2740039

Reputation: 105

Under bash, how to extract two numbers by parsing a string

guys,

I searched around a lot but can't get a desired solution to my problem. So I have to post here.

I need to extract two numbers from a string, the string may or may not contain other numbers except these two I want to parse.

For instance, the strings may looks like :

newSetupSL5_snolab_Int-300_Exp-10000_3515

snolab_Int-300_Exp-10000_1185

newSetupSL5_snolab_Int-300_Exp-5000_2522

So, what I want to extract are the numbers after "Int-" and "Exp-", which correspond to 300 and 10000 in the first and second string, 300 and 5000 in the third string.

Moreover, I need to use these two numbers for further analysis. That's is to say, I hope these two numbers can be assigned to two variables, rather just print them out, in a bash script, not a command line format.

Upvotes: 1

Views: 1434

Answers (3)

glenn jackman
glenn jackman

Reputation: 246807

Using bash regular expression matching

while read line; do
    if [[ $line =~ _Int-([[:digit:]]+)_Exp-([[:digit:]]+) ]]; then
        printf "int=%d; exp=%d\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
    fi
done <<END
newSetupSL5_snolab_Int-300_Exp-10000_3515
snolab_Int-300_Exp-10000_1185
newSetupSL5_snolab_Int-300_Exp-5000_2522
END
int=300; exp=10000
int=300; exp=10000
int=300; exp=5000

removing the while loop

str=newSetupSL5_snolab_Int-300_Exp-10000_3515
if [[ $line =~ _Int-([[:digit:]]+)_Exp-([[:digit:]]+) ]]; then
    printf "int=%d; exp=%d\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
fi

Upvotes: 4

fedorqui
fedorqui

Reputation: 289725

grep can make it with this look-behind expression:

$ grep -Po '(?<=Int-)\d+|(?<=Exp-)\d+' file
300
10000
300
10000
300
5000

To see it more clear, note how it fetches number just after Int-:

$ grep -Po '(?<=Int-)\d+' file
300
300
300

And then it is just a matter of adding the other condition with the |.


Update

Glenn Jackman's great suggestion improves the output:

$ grep -Po '(?<=Int-)\d+|(?<=Exp-)\d+' file | paste - - | while read n1 n2
> do
> echo "int=$n1 ext=$n2"
> done
int=300 ext=10000
int=300 ext=10000
int=300 ext=5000

On OP's comment

@fedorqui and glenn jackman : Thanks a lot for your codes - your code looks very nice. However, as mentioned in my original message, I actually need a line of code to deal with a string, rather a file. And this code line(s) should be integrated into my script. Do you know how to replace the "file" with "$string" ? Thanks a lot !

You can do it as follows:

grep -Po '(?<=Int-)\d+|(?<=Exp-)\d+' <<< "$string"

Upvotes: 2

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 70792

Under there is way to do this whitout requirement of external tools (forks) like sed, awk or other:

i=0;
while read string ;do
    ((i++))
    int=${string#*Int-}
    int=(${int//[a-z_-]/ })
    exp=${string#*Exp-}
    exp=(${exp//[a-z_-]/ })
    var=(${string//[a-z_-]/ })
    printf "Line #%2d contain: Int: %6s, Exp: %6s in %2d values: <%s>\n" \
        $i "$int" "$exp" ${#var[@]} "${var[*]}"
  done <<<'
newSetupSL5_snolab_Int-300_Exp-10000_3515

snolab_Int-300_Exp-10000_1185

newSetupSL5_snolab_Int-300_Exp-5000_2522
'
Line # 1 contain: Int:       , Exp:        in  0 values: <>
Line # 2 contain: Int:    300, Exp:  10000 in  4 values: <5 300 10000 3515>
Line # 3 contain: Int:       , Exp:        in  0 values: <>
Line # 4 contain: Int:    300, Exp:  10000 in  3 values: <300 10000 1185>
Line # 5 contain: Int:       , Exp:        in  0 values: <>
Line # 6 contain: Int:    300, Exp:   5000 in  4 values: <5 300 5000 2522>
Line # 7 contain: Int:       , Exp:        in  0 values: <>

or filtering lines containing both Exp- and Int-:

i=0
while read string ;do
    if [ "$string" != "${string#*Int-*Exp-}" ];then
        ((i++))
        int=${string#*Int-}
        int=(${int//[a-z_-]/ })
        exp=${string#*Exp-}
        exp=(${exp//[a-z_-]/ })
        var=(${string//[a-z_-]/ })
        printf "Line #%2d contain: Int: %6s, Exp: %6s in %2d values: <%s>\n" \
            $i "$int" "$exp" ${#var[@]} "${var[*]}"
      fi
  done <<<'
newSetupSL5_snolab_Int-300_Exp-10000_3515

snolab_Int-300_Exp-10000_1185

newSetupSL5_snolab_Int-300_Exp-5000_2522
'
Line # 1 contain: Int:    300, Exp:  10000 in  4 values: <5 300 10000 3515>
Line # 2 contain: Int:    300, Exp:  10000 in  3 values: <300 10000 1185>
Line # 3 contain: Int:    300, Exp:   5000 in  4 values: <5 300 5000 2522>

Upvotes: 0

Related Questions