Papu
Papu

Reputation: 47

Sed expression just prints entire file

I've been trying to extract the bold portion from the following:

<a href="/torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>

But for whatever reason, this sed expression-

sed --regexp-extended 's#<a href="(/torrent/.+)/">.*</a>#\1#'

-is returning the entire file, when of course, I only want the \1 capture group to be.

The weird thing, is that this expression worked just fine when I tried debugging it with desed; with the capture group and primary match showing up just fine.

I'm using gnu sed 4.8-1

Upvotes: 3

Views: 1308

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627219

You can use

sed -n -E '/.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/{s//\1/p;q}'

Details:

  • -n - suppresses default line output
  • -E - enables POSIX ERE regex syntax
  • /.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/ - finds a line containing < href=".../">...</a> substring, capturing the part between href=" and /"
  • {s//\1/p;q}' - replaces the string matched above with the value of the captured substring, prints it and quits.

See the online demo:

s='blah
<a href="/torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>
blah
<a href="/torrent/1111111/AAAAAAAAAAAAAAAAAAAAAA.mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>
blah'
sed -n -E '/.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/{s//\1/p;q}' <<< "$s"
# => /torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv

Upvotes: 1

Related Questions