Reputation: 17

substring extraction in bash

iamnewbie: this code is inefficient but it should extract the substring, the problem is with last echo statement,need some insight.

    function regex {


    #this function gives the regular expression needed



    echo -n \'

    for (( i = 1 ; i <= $1  ; i++ ))
         do
           echo -n . 
        done

    echo -n '\('

    for (( i = 1 ; i <= $2 ; i++ ))
         do
           echo -n . 
         done

     echo -n '\)'
     echo -n \'

     }
     # regex function ends


      echo "Enter the string:"

      read stg
      #variable stg holds the string entered

      if [ -z "$stg" ] ; then

           echo "Null string"

      exit

      else

           echo "Length of the $stg is:"

           z=`expr "$stg" : '.*' `

           #variable z holds the length of given string

           echo $z

      fi

      echo "Enter the number of trailing characters to be extracted from  $stg:"

      read n

      m=`expr $z - $n `
      #variable m holds an integer value which is equal to total length - length of characters to be extracted

      x=$(regex $m $n)

      echo ` expr "$stg" : "$x" `
      #the echo statement(above) is just printing a newline!! But not the result

What I intend to do with this code is, if I enter "racecar" and give "3" , it should display "car" which are the last three characters. Instead of displaying "car" its just printing a newline. Please correct this code rather than giving a better one.

Upvotes: 1

Answers (3)

rici

Reputation: 241931

Although you didn't ask for a better solution, it's worth mentioning:

$ n=3
$ stg=racecar
$ echo "${stg: -n}"
car

Note that the space after the : in ${stg: -n} is required. Without the space, the parameter expansion is a default-value expansion rather than a substring expansion. With the space, it's a substring expansion; -n is interpreted as an arithmetic expression (which means that n is interpreted as $n) and since the result is a negative number, it specifies the number of characters from the end to start the substring. See the Bash manual for details.

Your solution is based on evaluating the equivalent of:

expr "$stg" : '......\(...\)'

with an appropriate number of dots. It's important to understand what the above bash syntax actually means. It invokes the command expr, passing it three arguments:

arg 1: the contents of the variable stg

arg 2: :

arg 3: ......$...$

Note that there are no quotes visible. That's because the quotes are part of bash syntax, not part of the argument values.

If the value of stg had enough characters, the result of the above expr invocation would be to print out the 7th, 8th and 9th character of the value of stg`. Otherwise, it would print a blank line, and fail.

But that's not what you are doing. You're creating the regular expression:

'......\(...\)'

which has single quotes in it. Since single-quotes are not special characters in a regex, they match themselves; in other words, that pattern will match a string which starts with a single quote, followed by nine arbitrary characters, followed by another single quote. And if the string does match, it will print the three characters prior to the second single-quote.

Of course, since the regular expression you make has a . for every character in the target string, it won't match the target even if the target started and begun with a single-quote, since there would be too many dots in the regex to match that.

If you don't put single quotes into the regex, then your program will work, but I have to say that few times have I seen such an intensely circuitous implementation of the substring function. If you're not trying to win an obfuscated bash competition (a difficult challenge since most production bash code is obfuscated by nature), I'd suggest you use normal bash features instead of trying to do everything with regexen.

One of those is the syntax to determine the length of a string:

$ stg=racecar
$ echo ${#stg}
7

(although, as shown at the beginning, you don't actually even need that.)

Upvotes: 3

idobr

Reputation: 1647

I'm not sure you need loops for this task. I wrote some example to get two parameters from user and cut the word according to it.

#!/bin/bash
read -p "Enter some word? " -e stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
  echo "Null string"
  exit 1
fi

read -p "Enter some number to set word length? " -e cutNumber
# check that cutNumber is a number
if ! [ "$cutNumber" -eq "$cutNumber" ]; then
  echo "Not a number!"
  exit 1
fi
echo "Cut first n characters:"
echo ${stg:$cutNumber}
echo 
echo "Show first n characters:"
echo ${stg:0:$cutNumber}

echo "Alternative get last n characters:"
echo -n "$stg" | tail -c $cutNumber
echo

Example:

Enter some word? TheRaceCar
Enter some number to set word length? 7
Cut first n characters:
Car

Show first n characters:
TheRace
Alternative get last n characters:
RaceCar

Upvotes: 2

zerodiff

Reputation: 1700

What about:

$ n=3
$ string="racecar"
$ [[ "$string" =~ (.{$n})$ ]]
$ echo ${BASH_REMATCH[1]}
car

This looks for the last n characters at the end of the line. In a script:

#!/bin/bash

read -p "Enter a string: " string
read -p "Enter the number of characters you want from the end: " n
[[ "$string" =~ (.{$n})$ ]]
echo "These are the last $n characters: ${BASH_REMATCH[1]}"

You may want to add some more error handling, but this'll do it.

Upvotes: 2

substring extraction in bash

Answers (3)

Related Questions