Extract both href and text on same line using Xidel, specific links only

Question

I am trying to extract the link (href) and text inside the tag for a number of links in an html page.

I only want specific links, which I match by a substring.

Example of my html:

This should be 1234 some other html
This should be 1236 some other html
Not important link some other html

I am using Xidel, which allows me to avoid regexp. It seems to be the simplest for the job.

What I have so far:

xidel -e "//a/(@href[contains(.,'/this/dir')],text())"

It basically works, but two issues remain:

What is recommended way to get output like

/this/dir/1234  ; This should be 1234
/this/dir/1236  ; This should be 1236

Appreciate any feedback / tips.

edit:

The solution provided by Martin was 99% there. Newlines were not output, so I am using awk to replace a dummy text with newlines.

note : I am on windows.

xidel myhtml.htm -e "string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), 'XXX')" | awk -F "XXX" "{$1=$1}1" "OFS=
"

Answers (1)