XPath to select all paragraphs between two headers?

Question

I am trying to all p elements located between two h5 elements. The starting h5 text is "Subject" and the second h5 text is "tenders file".

You may see the picture attached as well.

I don't want to have other p elements which are coming after the second h5.

I have tried the following XPath:

//p[preceding-sibling::h5//*[contains(text() , 'SUBJECT')]  and following-sibling::h5//*[contains(text() , 'Tender’s Files,')]] trying to get idea from [enter link description here][2]

but could not get the right paragraphs. It still selects other paragraphs after the second h5.






Tender Title: Testing of Non-Fortified Wheat Flour in NES




Tender No: SYRIA-TA-2021-005


Location: North East Syria




Tender Package Available from: 2021-01-10




Deadline for Offer Submission: 2021-01-18 17:00 (Iraqi Time)







 



SUBJECT: Testing of Non-Fortified Wheat Flour in NES
Our organization, a non-profit organization, provides humanitarian assistance to “people in need”, is seeking quotations from eligible contractors to Testing of Non-Fortified Wheat Flour in NES. Our organization anticipates awarding Multiple or Single contract(s) as a result of this Solicitation. Our organization reserves the right to award more or none under this RFQ.
All bids shall be submitted via e-mail to Syr-tendering@blumont.org as PDF format and clearly written the subject of the tender This RFQ is in no way obligates our organization Our organization to award a contract nor does it commit our organization to pay any cost incurred in the preparation and submission of a proposal.
Our organization bears no responsibility for data errors resulting from transmission or conversion processes.
 

To help us with our procurement effort, please indicate in your email where (ngotenders.net) you saw this tender/procurement notice.

Sincerely
Procurement Committee
Tender’s Files,
5ffb04ba52a49-005-announcement.zip, 

الموضوع: فحص الطحين الغير مدعم في شمال شرق سوريا. 
منظمتنا و هي منظمة غير ربحية تعمل لخدمة المنكوبين في العالم و تسعى للحصول على عروض أسعار من المقاولين المؤهلين لغرض الموضوع: فحص الطحين الغير مدعم في شمال شرق سوريا. وتتوقع منظمتنا منح (عقود) متعددة أو مفردة نتيجة لهذا الطلب. وتحتفظ منظمتنا بالحق في منح التعاقد بأكثر أو أقل من المتوقع للطلب أعلاه.
لهذا الطلب. وتحتفظ منظمتنا بالحق في منح التعاقد بأكثر أو أقل من المتوقع للطلب أعلاه.
 يجب على جميع مقدمي العطاءات تقديم العروض عبر الايميل :عبر الايميل: Syr-tendering@blumont.org و بصيغة PDF و تم التوضيح للموضوع المناقصة بان المنظمة لا تلتزم بأي حال من الأحوال بمنح العقد كما أن المنظمة لا تلتزم بدفع أي تكاليف متكبدة في إعداد وتقديم العرض.
كما ان منظمتنا لا تتحمل أية مسؤولية عن أي أخطاء في البيانات الناتجة عن عمليات النقل أو التحويل او المحادثة.

مع فائق الاحترام  و التقدير
لجنة المشتريات
Tender’s Files,
5ffb04ba52a49-005-announcement.zip,

the page source code.

enter link description here

kjhughes · Accepted Answer

Using techniques from the following Q/A:

The following XPath,

//p[    preceding-sibling::h5[starts-with(normalize-space(),'SUBJECT:')]
    and following-sibling::h5[normalize-space()='Tender’s Files,']]

will select all p elements between your two targeted headlines, as requested.

Update after OP included actual markup:

Your actual markup includes duplicate

Tender’s Files,

headings. The above XPath will select through to the last such heading.

If you want to select through only the first such heading, use this XPath instead:

//p[    preceding-sibling::h5[starts-with(normalize-space(),'SUBJECT:')]
    and following-sibling::h5[normalize-space()='Tender’s Files,']
    and not(preceding-sibling::h5[normalize-space()='Tender’s Files,'])]

XPath to select all paragraphs between two headers?

Answers (2)

Update after OP included actual markup:

Related Questions