Reputation: 649
I am stuck with awk
I have a file with the following structure
<package author=".." label=".." url="..">
<package author=".." label=".." url="..">
...
<package author=".." label=".." url="..">
as an output I want to get the list of only url's
How to do it with awk.
I thought it should be something like
awk '/url="(.*)"/{print $0}' 123
However it doesn't work.
Thank you.
Upvotes: 0
Views: 348
Reputation: 41446
Another awk
cat file
<package author=".." label=".." url="https://www.cisco.com">
<package author=".." label=".." url="http://www.google.com/search">
awk -F\" '/url/ {print $2}' RS=" " file
https://www.cisco.com
http://www.google.com/search
Upvotes: 0
Reputation: 203169
With GNU awk:
awk '{print gensub(/.*url="([^"]+).*/,"\\1","")}' file
Upvotes: 2
Reputation: 289495
If you want to get the url
value, grep
can be your friend:
$ cat a
<package author=".." label=".." url="thisis an url">
<package author=".." label=".." url="hello">
$ grep -Po '(?<=url=\")[^"]+' a
thisis an url
hello
This will show everything contained from url="
(not included) until a double quote "
is found.
Upvotes: 2
Reputation: 31264
your awk
command only filters the lines that contain the given pattern (url=...
); since all your lines contain the string, it will give you all the lines.
in order to extract the information, you could do something like splitting up the 4th column along the double-quotes, e.g.:
awk '/url="(.*)"/{split($4, A, "\""); print A[2]}'
using sed
is probably much easier:
sed -e 's|^.*url="\([^"]*\)".*$|\1|g'
Upvotes: 0