Reputation: 8229
A question for sed
and awk
experts.
If I have a string like this : ABCDEF00012300XYZ
.
I want to extract the number that follows the alphabets and the zeroes. So, I want to extract 12300
from the string.
By spirit, I just want extract the valid number in the string. 00012300
means 12300
in mathematical sense.
I tried the following
STR=ABCDEF00012300XYZ
VALID_NUMBER="$(echo $STR | awk '{sub(/.*0+/,"");sub(/[a-zA-Z]+/,"")} 1')"
Above works if I pass ABCDEF000123XYZ
and it extracts 123
from STR
. But fails if 123
is followed by zeroes in which case it should get 12300
.
Note that this is sed
on linux that I am using
Upvotes: 2
Views: 51
Reputation: 2471
With Parameter Expansion :
str="ABCDEF00012300XYZ"
inter="${str%${str#*[[:digit:]]}}"
str="${str#${inter%[[:digit:]]}}"
inter="${str%${str#*[![:digit:]]}}"
str="${str%${str#${inter%[![:digit:]]}}}"
inter="${str%${str#*[1-9]}}"
str="${str#${inter%[1-9]}}"
echo "valid_number = $str"
Upvotes: 1
Reputation: 37404
Another awk:
$ awk '
match($0,/[1-9][0-9]*/) { # match first non-zero leading string of numbers
print substr($0,RSTART,RLENGTH) # and print it
}' <<< ABCDEF00012300XYZ # or you could echo ... | awk ...
12300
Or sed:
$ sed -E 's/(^[^1-9]*|[^0-9]+$)//g' <<< ABCDEF00012300XYZ
12300
That sed script replaces from the beginning all [^1-9]
and from the end [^0-9]
.
Upvotes: 2
Reputation: 5252
Another GNU awk solution:
$ STR=ABCDEF00012300XYZ
$ awk -v str="$STR" 'BEGIN{print gensub(/[A-Za-z0]+([0-9]+).*/, "\\1", 1, str)}'
12300
However, if it's not limited to after alphabets and zeros, then it's better like this:
awk -v str="$STR" 'BEGIN{print gensub(/[^1-9]*([0-9]+).*/, "\\1", 1, str)}'
Upvotes: 1
Reputation: 133458
Could you please try following(tested with GNU awk
).
echo "ABCDEF00012300XYZ" |
awk '
match($0,/[a-zA-Z]+0+[0-9]+/){
val=substr($0,RSTART,RLENGTH)
gsub(/[a-zA-Z]+[^1-9]0+/,"",val)
print val
val=""
}'
Explanation: Adding explanation for above code.
echo "ABCDEF00012300XYZ" | ##Printing value by shell echo command here and sending its output as standard input for awk command.
awk ' ##Starting awk command here.
match($0,/[a-zA-Z]+0+[0-9]+/){ ##Using match for matching regex for continous alphabets with continous zeros and then following digits in match OOTB function of awk.
val=substr($0,RSTART,RLENGTH) ##Creating variable val whose value is sub string of current line whose starting point is RSTART till value of RLENGTH.
gsub(/[a-zA-Z]+[^1-9]0+/,"",val) ##Using gsub to globally substituting alphabets then continous zeroes Leaving other digits(till other digit occurence comes) for val here.
print val ##Printing val value here.
val="" ##Nullifying variable val here.
}' ##Closing BLOCK for awk program here.
Upvotes: 1
Reputation: 626747
You may use sed
:
VALID_NUMBER="$(sed 's/^[A-Z0]*\([0-9]*\).*/\1/' <<< "$STR")"
See an online sed
demo.
The ^[A-Z0]*\([0-9]*\).*
pattern will match:
^
- start of a line[A-Z0]*
- any uppercase letters or zeros, 0 or more repetitions\([0-9]*\)
- this will capture 0 or more digits into Group 1.*
- this will match the rest of the line.Then, the \1
in the replacement pattern will only keep the number you need in the output.
Upvotes: 3