Kunal
Kunal

Reputation: 607

grep text before string - regex

I have to extract few fields from below input html text using bash (only).

HTML input

<a href="/something/somemorething/page?id=1234425">SOMETEXT</a>

I have extract id value and SOMETEXT from above input.

I am hoping that grep using some regex should workout. For id_value I am using following regex

"id=[0-9]*"

which is giving me correct results.

grep -o 'id=[0-9]*' index.html | head -n 5

But I am not sure what sort of regex I should use to grab text till next </a>.

Thanks in advance.

Upvotes: 2

Views: 380

Answers (2)

vks
vks

Reputation: 67978

(?<=>).*?(?=<)

You can use this with grep -P,since this uses lookarounds supported by perl.See demo.

https://regex101.com/r/fM9lY3/21

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522007

The regex you have in your OP ("id=[0-9]*") looks like it worked in your case, but a better approach is to hone down on the anchor tags themselves.

Here is a regex to extract out the id value:

<a.*?id=(\d.*?)">

And here is a regex to extract out the contents inside the <a> tag:

<a.*?">(.*?)<\/a>

Upvotes: 1

Related Questions