Arthur
Arthur

Reputation: 49

extract substring with SED

I have the next strings: for example:

     input1 = abc-def-ghi-jkl

     input2 = mno-pqr-stu-vwy

I want extract the first word between "-"

for the fisrt string I want to get: def

if the input is the second string, I want to get: pqr

I want to use the command SED, Could you help me please?

Upvotes: 1

Views: 5898

Answers (4)

user5683823
user5683823

Reputation:

grep solution (in my opinion this is the most natural approach, as you are only trying to find matches to a regular expression - you are not looking to edit anything, so there should be no need for the more advanced command sed)

grep -oP '^[^-]*-\K[^-]*(?=-)' << EOF
> abc-qrs-bobo-the-clown
> 123-45-6789
> blah-blah-blah
> no dashes here
> mahi-mahi
> EOF

Output

qrs
45
blah

Explanation

Look at the inputs first, included here for completeness as a heredoc (more likely you would name your file as the last argument to grep.) The solution requires at least two dashes to be present in the string; in particular, for mahi-mahi it will find no match. If you want to find the second mahi as a match, you can remove the lookahead assertion at the end of the regular expression (see below).

The regular expression does this. First note the command options: -o to return only the matched substring, not the entire line; and -P to use Perl extensions. Then, the regular expression: start from the beginning of the line (^); look for zero or more non-dash characters followed by dash, and then (\K) discard this part of the required match from the substrings found to match the pattern. Then look for zero or more non-dash characters again - this will be returned by the command. Finally, require a dash following this pattern, but do not include it in the match. This is done with a lookahead (marked by (?= ... )).

Upvotes: 0

Walter A
Walter A

Reputation: 19982

When you want to use sed, you can choose between solutions like

# Double processing
echo "$input1" | sed 's/[^-]*-//;s/-.*//'
# Normal approach
echo "$input1" | sed -r 's/^[^-]*-([^-]*)|-.*)/\1/g'
# Funny alternative
echo "$input1" | sed -r 's/(^[^-]*-|-.*)//g'

The obvious "external" tool would be cut. You can also look at a Bash builtin solution like

[[ ${input1} =~ ([^-]*)-([^-]*) ]] && printf %s "${BASH_REMATCH[2]}"

Upvotes: 0

Freddy
Freddy

Reputation: 4688

With bash:

var='input1 = abc-def-ghi-jkl'
var=${var#*-}      # remove shortest prefix `*-`, this removes `input1 = abc-`
echo "${var%%-*}"  # remove longest suffix `-*`, this removes `-ghi-jkl`

Or with awk:

awk -F'-' '{print $2}' <<<'input1 = abc-def-ghi-jkl'

Use - as input field separator and print the second field.


Or with cut:

cut -d'-' -f2 <<<'input1 = abc-def-ghi-jkl'

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

sed 's,^[^-]*-\([^-]*\).*,\1,' file

The string after the first - will be captured up to the second - and the rest will be matched, then the matched line will be replaced with the group text.

Upvotes: 2

Related Questions