Reputation: 81
i've a file contain following content:
(visible:true)
url(http://style.ep.com/image/control/flash1-tab.gif)
<img src="http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg" alt="god">
<script src="http://img1.ep.com/4667/codeFromLink.js"></script>
i want to get content between url( and ), also src=" and ",result as following:
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
i've used follow:
awk 'BEGIN{RS=")";FS="("}NF>1{print $NF}' $file_obj
awk 'BEGIN{RS=" ";FS="src=\""}NF>1{print($NF)}' $file_obj |sed 's/\"//g'
but i got:
visible:true
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js></script>
how to? thanks a lot.
Upvotes: 0
Views: 119
Reputation: 41456
Here is another gnu awk
(gnu due to RS
containing multiple characters)
awk -v RS="http" -F'[")]' 'NR>1{print RS$1}' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
Upvotes: 0
Reputation: 10653
Very short grep solution:
grep -Po '(url\(|src=")\K[^")]*' "$file_obj"
You can read about \K
here.
Or a bit longer, but safer:
grep -Po 'url\(\K[^)]*|src="\K[^"]*' "$file_obj"
Upvotes: 1
Reputation: 437753
A streamlined awk
solution:
awk -F'url\\(|\\)|src="|"' 'length($2) {print $2}' file
-F'url\\(|\\)|src="|"'
defines a regular expression to use as field separators (stored in reserved variable FS
, which the -F
command-line option sets), effectively comprising the following tokens:
url(
)
src="
"
(
and )
.
awk
's general string parsing interprets \
escape sequences in a first pass, so the \\
tells it that a literal \
should become part of the resulting regular expression, so that the regex engine sees, for instance, \(
, i.e.: a (
character that should be taken literally (rather than starting a capture group).$2
.length($2)
(implied: length($2) > 0
) ensures that the print command, {print $2}
, is only executed for lines where a URL is found.Caveat: Won't work with URLs that have embedded )
characters, but that's rare in practice.
To fix this, use the following instead:
awk -F'url\\(|\\)([[:blank:]]|$)|src="|"' 'length($2) {print $2}' file
Upvotes: 0
Reputation: 203502
$ sed -r -n -e 's/url\(([^)]+).*/\1/p' file
http://style.ep.com/image/control/flash1-tab.gif
$ sed -r -n -e 's/.*src="([^"]+).*/\1/p' file
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
$ sed -r -n -e 's/url\(([^)]+).*/\1/p' -e 's/.*src="([^"]+).*/\1/p' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
Upvotes: 0
Reputation: 174706
Try grep
with -oP
parameters,
$ grep -oP '(?<=url\()[^)]*|(?<=src=\")[^"]*' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
Through awk,
$ awk -F\( '/^url/{sub(/.$/,"",$2); print $2}/src=/{split($0,a,"\""); print a[2]}' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js
Upvotes: 0