user10756
user10756

Reputation: 649

awk matching pattern in file

I am stuck with awk

I have a file with the following structure

<package author=".." label=".." url="..">
<package author=".." label=".." url="..">
...
<package author=".." label=".." url="..">

as an output I want to get the list of only url's

How to do it with awk.

I thought it should be something like

awk '/url="(.*)"/{print $0}' 123

However it doesn't work.

Thank you.

Upvotes: 0

Views: 348

Answers (5)

Vijay
Vijay

Reputation: 67211

perl -lne 'print $1 if(/url=\"([^\"]*)\"/)' your_file

Upvotes: 0

Jotne
Jotne

Reputation: 41446

Another awk

cat file
<package author=".." label=".." url="https://www.cisco.com">
<package author=".." label=".." url="http://www.google.com/search">

awk -F\" '/url/ {print $2}' RS=" " file
https://www.cisco.com
http://www.google.com/search

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203169

With GNU awk:

awk '{print gensub(/.*url="([^"]+).*/,"\\1","")}' file

Upvotes: 2

fedorqui
fedorqui

Reputation: 289495

If you want to get the url value, grep can be your friend:

$ cat a
<package author=".." label=".." url="thisis an url">
<package author=".." label=".." url="hello">
$ grep -Po '(?<=url=\")[^"]+' a
thisis an url
hello

This will show everything contained from url=" (not included) until a double quote " is found.

Upvotes: 2

uml&#228;ute
uml&#228;ute

Reputation: 31264

your awk command only filters the lines that contain the given pattern (url=...); since all your lines contain the string, it will give you all the lines. in order to extract the information, you could do something like splitting up the 4th column along the double-quotes, e.g.:

awk '/url="(.*)"/{split($4, A, "\""); print A[2]}'

using sed is probably much easier:

sed -e 's|^.*url="\([^"]*\)".*$|\1|g'

Upvotes: 0

Related Questions