Ashok K Harnal
Ashok K Harnal

Reputation: 1221

Extracting a substring from a variable using bash script

I have a bash variable with value something like this:

10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

There are no spaces within value. This value can be very long or very short. Here pairs such as 65:3.0 exist. I know the value of a number from the first part of pair, say 65. I want to extract the number 3.0 or pair 65:3.0. I am not aware of the position (offset) of 65.

I will be grateful for a bash-script that can do such extraction. Thanks.

Upvotes: 3

Views: 517

Answers (7)

Archemar
Archemar

Reputation: 571

try

echo $var | tr , '\n' | awk '/65/' 

where

  • tr , '\n' turn comma to new line
  • awk '/65/' pick the line with 65

or

echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}' 

where

  • -F: use : as separator
  • $1 == 65 pick line with 65 as first field
  • { print $2} print second field

Upvotes: 3

David C. Rankin
David C. Rankin

Reputation: 84521

Using sed

sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"

output:

65:3.0

There are two different ways to protect against 65:3.0 being the first-in-line or last-in-line. Above, commas are added to surround the variable providing for an occurrence regardless. Below, the Gnu extension \? is used to specify zero-or-one occurrence.

sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var

Both handle 65:3.0 regardless of where it appears in the string.

Upvotes: 2

Jotne
Jotne

Reputation: 41446

Here is an gnu awk

awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0

Upvotes: 3

SMA
SMA

Reputation: 37023

Try egrep like below:

echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' | 

Upvotes: 1

gniourf_gniourf
gniourf_gniourf

Reputation: 46813

Here's a pure Bash solution:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

while read -r -d, i; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done <<< "$var,"

You may use break after echo "$i" if there's only one 65:... in var, or if you only want the first one.

To get the value 3.0: echo "${i#*:}".


Other (pure Bash) approach, without parsing the string explicitly. I'm assuming you're only looking for the first 65 in the string, and that it is present in the string:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

value=${var#*,65:}
value=${value%%,*}
echo "$value"

This will be very slow for long strings!


Same as above, but will output all the values corresponding to 65 (or none if there are none):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
    tmpvar=${tmpvar#*,65:}
    echo "${tmpvar%%,*}"
done

Same thing, this will be slow for long strings!


The fastest I can obtain in pure Bash is my original answer (and it's fine with 10000 fields):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done

In fact, no, the fastest I can obtain in pure Bash is with this regex:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"

Test of this vs awk,

  • where the 65:3.0 is at the end:

    printf -v var '%s:3.0,' {100..11000}
    var+=65:42.0
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    

    shows 0m0.020s (rough average) whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    

    shows 0m0.008s (rough average too).

  • where the 65:3.0 is not at the end:

    printf -v var '%s:3.0,' {1..10000}
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    

    shows 0m0.020s (rough average) and with early exit:

    time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
    

    shows 0m0.010s (rough average) whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    

    shows 0m0.002s (rough average).

Upvotes: 4

αғsнιη
αғsнιη

Reputation: 2761

With grep:

grep -o '\b65\b[^,]*' <<<"$var"
65:3.0

Or

grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0

\K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .

Upvotes: 3

user000001
user000001

Reputation: 33307

Probably awk is the most straight-forward approach:

awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0

Or to get the pair:

$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0

Upvotes: 5

Related Questions