Reputation: 14745
Given a filename in the form someletters_12345_moreleters.ext
, I want to extract the 5 digits and put them into a variable.
So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
Upvotes: 1184
Views: 2024107
Reputation: 507245
Generic solution where the number can be anywhere in the filename, using the first of such sequences:
number=$(echo "$filename" | egrep -o '[[:digit:]]{5}' | head -n1)
Another solution to extract exactly a part of a variable:
number="${filename:offset:length}"
If your filename always have the format stuff_digits_...
you can use awk:
number=$(echo "$filename" | awk -F _ '{ print $2 }')
Yet another solution to remove everything except digits, use
number=$(echo "$filename" | tr -cd '[[:digit:]]')
Upvotes: 138
Reputation: 738
Lots of outdated solutions to this problem that require pipes and subshells.
Since version 3 of bash (released in 2004), it has a built-in regular expression comparison operator =~
.
input="someletters_12345_moreleters.ext"
# match: underscore followed by 1 or more digits followed by underscore
[[ $input =~ _([0-9]+)_ ]]
echo ${BASH_REMATCH[1]}
Output:
12345
Note, if you're not very proficient in writing RegExp's I recommend reading Mastering Regular Expressions.
If you just need to figure out how to get your RegExp to work, and it's not matching the way you think, try the online GUI at RegEx101.com and set your "Flavor" to "PCRE" so you get the POSIX style character classes like [[:digit:]]
that bash
uses.
Upvotes: 8
Reputation: 1301
An easy way to use sed replace:
result=$(echo "someletters_12345_moreleters.ext" | sed 's/.*_\(.*\)_.*/\1/g')
echo $result
Upvotes: 2
Reputation: 4505
I love sed
's capability to deal with regex groups:
> var="someletters_12345_moreletters.ext"
> digits=$( echo "$var" | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345
A slightly more general option would be not to assume that you have an underscore _
marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\+\([0-9]\+\).*/\1/p
.
> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to
refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
More on this, in case you're not too confident with regexps:
s
is for _s_ubstitute[0-9]+
matches 1+ digits\1
links to the group n.1 of the regex output (group 0 is the whole match, group 1 is the match within parentheses in this case)p
flag is for _p_rintingAll escapes \
are there to make sed
's regexp processing work.
Upvotes: 11
Reputation: 76
Here is a substring.sh file
Usage
`substring.sh $TEXT 2 3` # characters 2-3
`substring.sh $TEXT 2` # characters 2 and after
substring.sh follows this line
#echo "starting substring"
chars=$1
start=$(($2))
end=$3
i=0
o=""
if [[ -z $end ]]; then
end=`echo "$chars " | wc -c`
else
end=$((end))
fi
#echo "length is " $e
a=`echo $chars | sed 's/\(.\)/\1 /g'`
#echo "a is " $a
for c in $a
do
#echo "substring" $i $e $c
if [[ i -lt $start ]]; then
: # DO Nothing
elif [[ i -gt $end ]]; then
break;
else
o="$o$c"
fi
i=$(($i+1))
done
#echo substring returning $o
echo $o
Upvotes: 0
Reputation: 995
May be this could help you to get desired output
Code :
your_number=$(echo "someletters_12345_moreleters.ext" | grep -E -o '[0-9]{5}')
echo $your_number
Output :
12345
Upvotes: 4
Reputation: 42154
You can use Parameter Expansion to do this.
If a is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}
where 12 is the offset (zero-based) and 5 is the length
If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:
tmp=${a#*_} # remove prefix ending in "_"
b=${tmp%_*} # remove suffix starting with "_"
If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
Upvotes: 1668
Reputation: 22379
Inklusive end, similar to JS and Java implementations. Remove +1 if you do not desire this.
function substring() {
local str="$1" start="${2}" end="${3}"
if [[ "$start" == "" ]]; then start="0"; fi
if [[ "$end" == "" ]]; then end="${#str}"; fi
local length="((${end}-${start}+1))"
echo "${str:${start}:${length}}"
}
Example:
substring 01234 0
01234
substring 012345 0
012345
substring 012345 0 0
0
substring 012345 1 1
1
substring 012345 1 2
12
substring 012345 0 1
01
substring 012345 0 2
012
substring 012345 0 3
0123
substring 012345 0 4
01234
substring 012345 0 5
012345
More example calls:
substring 012345 0
012345
substring 012345 1
12345
substring 012345 2
2345
substring 012345 3
345
substring 012345 4
45
substring 012345 5
5
substring 012345 6
substring 012345 3 5
345
substring 012345 3 4
34
substring 012345 2 4
234
substring 012345 1 3
123
Upvotes: 1
Reputation: 583
shell cut - print specific range of characters or given part from a string
#method1) using bash
str=2020-08-08T07:40:00.000Z
echo ${str:11:8}
#method2) using cut
str=2020-08-08T07:40:00.000Z
cut -c12-19 <<< $str
#method3) when working with awk
str=2020-08-08T07:40:00.000Z
awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str
Upvotes: 9
Reputation: 9235
Here's how i'd do it:
FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
Explanation:
Bash-specific:
[[ ]]
indicates a conditional expression=~
indicates the condition is a regular expression&&
chains the commands if the prior command was successfulRegular Expressions (RE): _([[:digit:]]{5})_
_
are literals to demarcate/anchor matching boundaries for the string being matched()
create a capture group[[:digit:]]
is a character class, i think it speaks for itself{5}
means exactly five of the prior character, class (as in this example), or group must matchIn english, you can think of it behaving like this: the FN
string is iterated character by character until we see an _
at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _
, the condition is successful, the capture group is made available in BASH_REMATCH
, and the next NUM=
statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _
. e.g. if FN
where _1 _12 _123 _1234 _12345_
, there would be four false starts before it found a match.
Upvotes: 64
Reputation: 99
Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST"
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST
Upvotes: 9
Reputation: 2095
My answer will have more control on what you want out of your string. Here is the code on how you can extract 12345
out of your string
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str
This will be more efficient if you want to extract something that has any chars like abc
or any special characters like _
or -
. For example: If your string is like this and you want everything that is after someletters_
and before _moreleters.ext
:
str="someletters_123-45-24a&13b-1_moreleters.ext"
With my code you can mention what exactly you want. Explanation:
#*
It will remove the preceding string including the matching key. Here the key we mentioned is _
%
It will remove the following string including the matching key. Here the key we mentioned is '_more*'
Do some experiments yourself and you would find this interesting.
Upvotes: 13
Reputation: 7207
In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]
Result:
${parameter:offset} ${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.
Upvotes: 39
Reputation:
A bash solution:
IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'
This will clobber a variable called x
. The var x
could be changed to the var _
.
input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"
Upvotes: 3
Reputation:
If we focus in the concept of:
"A run of (one or several) digits"
We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
name='someletters_12345_moreleters.ext'
echo $name | sed 's/[^0-9]*//g' # 12345
echo $name | tr -c -d 0-9 # 12345
But if $name contains several runs of numbers, the above will fail:
If "name=someletters_12345_moreleters_323_end.ext", then:
echo $name | sed 's/[^0-9]*//g' # 12345323
echo $name | tr -c -d 0-9 # 12345323
We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:
echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'
But we could as well do it directly in bash(1) :
regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}
This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.
Note: regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5 digit runs. :-)
(1): faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.
Upvotes: 17
Reputation: 97
Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345
Upvotes: 4
Reputation: 36827
Use cut:
echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2
More generic:
INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING
Upvotes: 945
Reputation: 290225
Following the requirements
I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I found some grep
ways that may be useful:
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+"
12345
or better
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}"
12345
And then with -Po
syntax:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+'
12345
Or if you want to make it fit exactly 5 characters:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}'
12345
Finally, to make it be stored in a variable it is just need to use the var=$(command)
syntax.
Upvotes: 14
Reputation: 708
A little late, but I just ran across this problem and found the following:
host:/tmp$ asd=someletters_12345_moreleters.ext
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$
I used it to get millisecond resolution on an embedded system that does not have %N for date:
set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction
Upvotes: 0
Reputation: 12945
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
Upvotes: 4
Reputation: 12745
I'm surprised this pure bash solution didn't come up:
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345
You probably want to reset IFS to what value it was before, or unset IFS
afterwards!
Upvotes: 26
Reputation: 119
Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches the first block of digits and does not depend on the surrounding underscores:
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}" # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}" # strip off non-digit suffix from s1
echo "$s2" # 12345
Upvotes: 11
Reputation: 21620
Without any sub-processes you can:
shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}
A very small variant of this will also work in ksh93.
Upvotes: 13
Reputation: 711
There's also the bash builtin 'expr' command:
INPUT="someletters_12345_moreleters.ext"
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `
echo $SUBSTRING
Upvotes: 2
Reputation: 17004
Building on jor's answer (which doesn't work for me):
substring=$(expr "$filename" : '.*_\([^_]*\)_.*')
Upvotes: 23