Reputation: 47
I've been trying to extract the bold portion from the following:
<a href="/torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>
But for whatever reason, this sed expression-
sed --regexp-extended 's#<a href="(/torrent/.+)/">.*</a>#\1#'
-is returning the entire file, when of course, I only want the \1
capture group to be.
The weird thing, is that this expression worked just fine when I tried debugging it with desed; with the capture group and primary match showing up just fine.
I'm using gnu sed 4.8-1
Upvotes: 3
Views: 1308
Reputation: 627219
You can use
sed -n -E '/.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/{s//\1/p;q}'
Details:
-n
- suppresses default line output-E
- enables POSIX ERE regex syntax/.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/
- finds a line containing < href=".../">...</a>
substring, capturing the part between href="
and /"
{s//\1/p;q}'
- replaces the string matched above with the value of the captured substring, p
rints it and q
uits.See the online demo:
s='blah
<a href="/torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>
blah
<a href="/torrent/1111111/AAAAAAAAAAAAAAAAAAAAAA.mkv/">[HorribleSubs] Black Clover - 128 [720p].mkv</a>
blah'
sed -n -E '/.*<a href="(\/torrent\/[^"]*)\/">[^<]*<\/a>.*/{s//\1/p;q}' <<< "$s"
# => /torrent/4384536/HorribleSubs-Black-Clover-128-720p-mkv
Upvotes: 1