Reputation: 118
I am trying to fetch Some IDs from URL.
In my script I hit the URL using while loop and wget command and I save output in file.
Then in same loop I grep XYZ User ID:
and 3 lines after this string and save it to another file.
When I open this output file I find following lines.
< p >XYZ User ID:< /p>
< /td >
< td>
< p>2989288174< /p>
So using grep
or any thing else how can I print following output
XYZ User ID:2989288174
Upvotes: 2
Views: 202
Reputation: 22428
This should work (sed
with extended regex):
sed -nr 's#<\s*p\s*>([^>]*)<\s*/\s*p\s*>#\1#p' file | tr -d '\n'
Output:
XYZ User ID:2989288174
Upvotes: 1
Reputation: 14955
Supposing a constant tag
pattern:
<p>XYZ User ID:</p>
</td>
<td>
<p>2989288174</p>
grep
should be the best way:
grep -oP '(?<=p>)([^>]+?)(?=<\/p)' outputfile|while read user;do
read id
echo "$user $id"
done
Note that look-behind expressions cannot be of variable length. That means you cannot use quantifiers ?
, *
, +
, etc or alternation of different-length items inside them.
For variable length tags awk
could be well suited for oneliner tags:
awk '/User ID/{print ""}/p *>/{printf $3}' FS='(p *>|<)' outputfile
Upvotes: 3