Reputation: 607
I have to extract few fields from below input html text using bash (only).
HTML input
<a href="/something/somemorething/page?id=1234425">SOMETEXT</a>
I have extract id value and SOMETEXT from above input.
I am hoping that grep using some regex should workout.
For id_value
I am using following regex
"id=[0-9]*"
which is giving me correct results.
grep -o 'id=[0-9]*' index.html | head -n 5
But I am not sure what sort of regex I should use to grab text till next </a>
.
Thanks in advance.
Upvotes: 2
Views: 380
Reputation: 67978
(?<=>).*?(?=<)
You can use this with grep -P
,since this uses lookarounds supported by perl.See demo.
https://regex101.com/r/fM9lY3/21
Upvotes: 2
Reputation: 522007
The regex you have in your OP ("id=[0-9]*"
) looks like it worked in your case, but a better approach is to hone down on the anchor tags themselves.
Here is a regex to extract out the id value:
<a.*?id=(\d.*?)">
And here is a regex to extract out the contents inside the <a>
tag:
<a.*?">(.*?)<\/a>
Upvotes: 1