cosbor11
cosbor11

Reputation: 16034

Capture text between two tokens

I'm trying to get the text between two tokens.

For example, let's say the text is:

arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end

The output should be: CaptureThis

And the two tokens are: :start: and /end

The closest I could get was using this regex:

INPUT="arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end"
VALUE=$(echo "${INPUT}" | sed -e 's/:start:\(.*\)\/end/\1/')

... but this returns most of the string: arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end

How do I get all of the other text out of the way?

Upvotes: 2

Views: 132

Answers (4)

Benjamin W.
Benjamin W.

Reputation: 52291

You could use (GNU) grep with Perl regular expressions (look-arounds) and the -o option to only return the match:

$ grep -Po '(?<=:start:).*(?=/end)' <<< 'arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end'
CaptureThis

Upvotes: 3

David C. Rankin
David C. Rankin

Reputation: 84579

There is no need for any external utilities, bash parameter-expansion will handle it all for you:

INPUT="arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end"
token=${INPUT##*:}
echo ${token%/*}

Output

CaptureThis

Upvotes: 2

mklement0
mklement0

Reputation: 439228

Try this:

$ sed 's/^.*:start:\(.*\)\/end.*$/\1/' <<<'arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end'
CaptureThis

The problem with your approach was that you only replaced part of the input line, because your regex didn't capture the entire line.

Note how the command above anchors the regex both at the beginning of the line (^.*) and at the end (.*$) so as to ensure that the entire line is matched and thus replaced.

Upvotes: 2

racraman
racraman

Reputation: 5034

You could use :

VALUE=$(echo "${INPUT}" | sed -e 's/.*:start:\(.*\)\/end.*/\1/')

If the tokens are liable to change, you could use variables - but since "/end" has a "/", that could lead to sed getting confused, so you'd probably want to change its delimiter to some non-conflicting character (like a "?"), so :

TOKEN1=":start:"
TOKEN2="/end"
VALUE=$(echo "${INPUT}" | sed -e "s?.*$TOKEN1\(.*\)$TOKEN2.*?\1?")

Upvotes: 2

Related Questions