user3818534
user3818534

Reputation: 81

how to get content between special string with shell awk or sed?

i've a file contain following content:

(visible:true)
url(http://style.ep.com/image/control/flash1-tab.gif)
<img src="http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg" alt="god">
<script src="http://img1.ep.com/4667/codeFromLink.js"></script>

i want to get content between url( and ), also src=" and ",result as following:

http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

i've used follow:

awk 'BEGIN{RS=")";FS="("}NF>1{print $NF}' $file_obj
awk 'BEGIN{RS=" ";FS="src=\""}NF>1{print($NF)}' $file_obj |sed 's/\"//g'

but i got:

visible:true
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js></script>

how to? thanks a lot.

Upvotes: 0

Views: 119

Answers (5)

Jotne
Jotne

Reputation: 41456

Here is another gnu awk (gnu due to RS containing multiple characters)

awk -v RS="http" -F'[")]' 'NR>1{print RS$1}' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

Upvotes: 0

Aleks-Daniel Jakimenko-A.
Aleks-Daniel Jakimenko-A.

Reputation: 10653

Very short grep solution:

grep -Po '(url\(|src=")\K[^")]*' "$file_obj"

You can read about \K here.

Or a bit longer, but safer:

grep -Po 'url\(\K[^)]*|src="\K[^"]*' "$file_obj"

Upvotes: 1

mklement0
mklement0

Reputation: 437753

A streamlined awk solution:

awk -F'url\\(|\\)|src="|"' 'length($2) {print $2}' file
  • -F'url\\(|\\)|src="|"' defines a regular expression to use as field separators (stored in reserved variable FS, which the -F command-line option sets), effectively comprising the following tokens:
    • url(
    • )
    • src="
    • "
    • Note the required double-backslash escaping of ( and ).
      • awk's general string parsing interprets \ escape sequences in a first pass, so the \\ tells it that a literal \ should become part of the resulting regular expression, so that the regex engine sees, for instance, \(, i.e.: a ( character that should be taken literally (rather than starting a capture group).
  • Splitting each line uses these tokens as field separators puts the URL into the 2nd field, $2.
  • Since not all input lines contain a URL, pattern length($2) (implied: length($2) > 0) ensures that the print command, {print $2}, is only executed for lines where a URL is found.

Caveat: Won't work with URLs that have embedded ) characters, but that's rare in practice.
To fix this, use the following instead:

awk -F'url\\(|\\)([[:blank:]]|$)|src="|"' 'length($2) {print $2}' file

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203502

$ sed -r -n -e 's/url\(([^)]+).*/\1/p' file
http://style.ep.com/image/control/flash1-tab.gif

$ sed -r -n -e 's/.*src="([^"]+).*/\1/p' file
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

$ sed -r -n -e 's/url\(([^)]+).*/\1/p' -e 's/.*src="([^"]+).*/\1/p' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

Try grep with -oP parameters,

$ grep -oP '(?<=url\()[^)]*|(?<=src=\")[^"]*' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

Through awk,

$ awk -F\( '/^url/{sub(/.$/,"",$2); print $2}/src=/{split($0,a,"\""); print a[2]}' file
http://style.ep.com/image/control/flash1-tab.gif
http://img1.ep.com/4667/product/s-50f8f86cf0822.jpg.jpg
http://img1.ep.com/4667/codeFromLink.js

Upvotes: 0

Related Questions