Reputation: 649

awk matching pattern in file

I am stuck with awk

I have a file with the following structure

<package author=".." label=".." url="..">
<package author=".." label=".." url="..">
...
<package author=".." label=".." url="..">

as an output I want to get the list of only url's

How to do it with awk.

I thought it should be something like

awk '/url="(.*)"/{print $0}' 123

However it doesn't work.

Thank you.

Upvotes: 0

Answers (5)

Vijay

Reputation: 67211

perl -lne 'print $1 if(/url=\"([^\"]*)\"/)' your_file

Upvotes: 0

Jotne

Reputation: 41446

Another awk

cat file
<package author=".." label=".." url="https://www.cisco.com">
<package author=".." label=".." url="http://www.google.com/search">

awk -F\" '/url/ {print $2}' RS=" " file
https://www.cisco.com
http://www.google.com/search

Upvotes: 0

Ed Morton

Reputation: 203169

With GNU awk:

awk '{print gensub(/.*url="([^"]+).*/,"\\1","")}' file

Upvotes: 2

fedorqui

Reputation: 289495

If you want to get the url value, grep can be your friend:

$ cat a
<package author=".." label=".." url="thisis an url">
<package author=".." label=".." url="hello">
$ grep -Po '(?<=url=\")[^"]+' a
thisis an url
hello

This will show everything contained from url=" (not included) until a double quote " is found.

Upvotes: 2

umläute

Reputation: 31264

your awk command only filters the lines that contain the given pattern (url=...); since all your lines contain the string, it will give you all the lines. in order to extract the information, you could do something like splitting up the 4th column along the double-quotes, e.g.:

awk '/url="(.*)"/{split($4, A, "\""); print A[2]}'

using sed is probably much easier:

sed -e 's|^.*url="\([^"]*\)".*$|\1|g'

Upvotes: 0

awk matching pattern in file

Answers (5)

Related Questions