TheWaterProgrammer
TheWaterProgrammer

Reputation: 8229

How to extract a valid number from string even if it is followed by zeroes

A question for sed and awk experts.

If I have a string like this : ABCDEF00012300XYZ. I want to extract the number that follows the alphabets and the zeroes. So, I want to extract 12300 from the string.

By spirit, I just want extract the valid number in the string. 00012300 means 12300 in mathematical sense.

I tried the following

STR=ABCDEF00012300XYZ
VALID_NUMBER="$(echo $STR | awk '{sub(/.*0+/,"");sub(/[a-zA-Z]+/,"")} 1')"

Above works if I pass ABCDEF000123XYZ and it extracts 123 from STR. But fails if 123 is followed by zeroes in which case it should get 12300.

Note that this is sed on linux that I am using

Upvotes: 2

Views: 51

Answers (5)

ctac_
ctac_

Reputation: 2471

With Parameter Expansion :

str="ABCDEF00012300XYZ"
inter="${str%${str#*[[:digit:]]}}"
str="${str#${inter%[[:digit:]]}}"
inter="${str%${str#*[![:digit:]]}}"
str="${str%${str#${inter%[![:digit:]]}}}"
inter="${str%${str#*[1-9]}}"
str="${str#${inter%[1-9]}}"
echo "valid_number = $str"

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Another awk:

$ awk '
match($0,/[1-9][0-9]*/) {            # match first non-zero leading string of numbers
    print substr($0,RSTART,RLENGTH)  # and print it
}' <<< ABCDEF00012300XYZ             # or you could echo ... | awk ...
12300

Or sed:

$ sed -E 's/(^[^1-9]*|[^0-9]+$)//g' <<< ABCDEF00012300XYZ
12300

That sed script replaces from the beginning all [^1-9] and from the end [^0-9].

Upvotes: 2

Tyl
Tyl

Reputation: 5252

Another GNU awk solution:

$ STR=ABCDEF00012300XYZ                                                          

$ awk -v str="$STR" 'BEGIN{print gensub(/[A-Za-z0]+([0-9]+).*/, "\\1", 1, str)}' 
12300    

However, if it's not limited to after alphabets and zeros, then it's better like this:

awk -v str="$STR" 'BEGIN{print gensub(/[^1-9]*([0-9]+).*/, "\\1", 1, str)}' 

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

Could you please try following(tested with GNU awk).

echo "ABCDEF00012300XYZ" |
awk '
  match($0,/[a-zA-Z]+0+[0-9]+/){
    val=substr($0,RSTART,RLENGTH)
    gsub(/[a-zA-Z]+[^1-9]0+/,"",val)
    print val
   val=""
}'

Explanation: Adding explanation for above code.

echo "ABCDEF00012300XYZ" |               ##Printing value by shell echo command here and sending its output as standard input for awk command.
awk '                                    ##Starting awk command here.
  match($0,/[a-zA-Z]+0+[0-9]+/){         ##Using match for matching regex for continous alphabets with continous zeros and then following digits in match OOTB function of awk.
    val=substr($0,RSTART,RLENGTH)        ##Creating variable val whose value is sub string of current line whose starting point is RSTART till value of RLENGTH.
    gsub(/[a-zA-Z]+[^1-9]0+/,"",val)     ##Using gsub to globally substituting alphabets then continous zeroes Leaving other digits(till other digit occurence comes) for val here.
    print val                            ##Printing val value here.
   val=""                                ##Nullifying variable val here.
}'                                       ##Closing BLOCK for awk program here.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may use sed:

VALID_NUMBER="$(sed 's/^[A-Z0]*\([0-9]*\).*/\1/' <<< "$STR")"

See an online sed demo.

The ^[A-Z0]*\([0-9]*\).* pattern will match:

  • ^ - start of a line
  • [A-Z0]* - any uppercase letters or zeros, 0 or more repetitions
  • \([0-9]*\) - this will capture 0 or more digits into Group 1
  • .* - this will match the rest of the line.

Then, the \1 in the replacement pattern will only keep the number you need in the output.

Upvotes: 3

Related Questions