Reputation: 129
I've been searching for a ling time, and have not been able to find a working answer for my problem.
I have a line from an HTML file extracted with sed '162!d' skinlist.html
, which contains the text
<a href="/skin/dwarf-red-beard-734/" title="Dwarf Red Beard">
.
I want to extract the text Dwarf Red Beard
, but that text is modular (can be changed), so I would like to extract the text between title="
and "
.
I cannot, for the life of me, figure out how to do this.
Upvotes: 2
Views: 2302
Reputation: 1734
Solution in sed
sed -n '162 s/^.*title="\(.*\)".*$/\1/p' skinlist.html
Extracts line 162
in skinlist.html
and captures the title
attributes contents in\1
.
Upvotes: 1
Reputation: 2524
You can pass it through another sed
or add expressions to that sed
like -e 's/.*title="//g' -e 's/">.*$//g'
Upvotes: 0
Reputation: 1
awk 'NR==162 {print $4}' FS='"' skinlist.html
"
Upvotes: 2
Reputation: 125788
The shell's variable expansion syntax allows you to trim prefixes and suffixes from a string:
line="$(sed '162!d' skinlist.html)" # extract the relevant line from the file
temp="${line#* title=\"}" # remove from the beginning through the first match of ' title="'
if [ "$temp" = "$line" ]; then
echo "title not found in '$line'" >&2
else
title="${temp%%\"*}" # remote from the first '"' through the end
fi
Upvotes: 0