sed awk get substring instead - regex

Question

Hi how to use sed or awk to extract substring that matches a regular expression.

I have seen several modify or change substring but I just want to get the matching part.

my data looks like below:

http://www.A.com/sitemap1.gz
http://www.A.com/sitemap2.gz
http://www.A.com/sitemap3.gz
http://www.A.com/sitemap4.gz
http://www.A.com/sitemap5.gz
http://www.A.com/sitemap6.gz
http://www.A.com/sitemap7.gz
http://www.A.com/sitemap8.gz

Output should look like

http://www.A.com/sitemap1.gz
http://www.A.com/sitemap2.gz
http://www.A.com/sitemap3.gz
....

I tried

cat data | sed 's/'http.*gz'//'

but this command actually removes exactly the part that I want to keep. Thanks

Chris Seymour · Accepted Answer

A simple grep will do with the -o option:

$ grep -o 'http[^<]*' file
http://www.A.com/sitemap1.gz
http://www.A.com/sitemap2.gz
http://www.A.com/sitemap3.gz
http://www.A.com/sitemap4.gz
http://www.A.com/sitemap5.gz
http://www.A.com/sitemap6.gz
http://www.A.com/sitemap7.gz
http://www.A.com/sitemap8.gz

With awk you could do:

$ awk -F'[<>]' '{print $3}' file
http://www.A.com/sitemap1.gz
http://www.A.com/sitemap2.gz
http://www.A.com/sitemap3.gz
http://www.A.com/sitemap4.gz
http://www.A.com/sitemap5.gz
http://www.A.com/sitemap6.gz
http://www.A.com/sitemap7.gz
http://www.A.com/sitemap8.gz

sed awk get substring instead - regex

Answers (2)

Related Questions