itext html to pdf content gets out of document

I'm trying to convert this piece of html without any css:

<!-- saved from url=(1129)https://00f74ba44bf27c26fa604fec19ae391f1d94b6b867-apidata.googleusercontent.com/download/storage/v1/b/backoffice-pao-export/o/document.html?jk=AFshE3XhuRHA7mtfWHAXotti5kjbdIdwxYMBJwIALdaUHwAd5SAytVpKLo_GL_3G_C4shq09Xmhlh2M5uo4BlheALWF58v-9mdqU7EYAR03iEraa1dZZNG0eu3waNSsxkMoxAHr-_GqZXDUHVNvMrLZnTiO7uYcZzQ2OuWvLl3xnX2ppzF0fZ3Bi1b7Rka7nhlNGmrjYDbWWBbrWRiiMnBNd_QZAK_T0t5XobSXCwlJ90IczJLMgjlDYXdq6UJzlsJQLEBI4MA5Ca1s0x-yhygik9sYOv1yawtyPAmvUfwVThET3b6HEA_tnVShpSes8rLZzAJemRtJ7HAJ0NhasQxwsIwOtmriFl8jhQCbFT7nxlwmnfhnSwTSqCxL9JiBdCTHOEqmHVCfsGAC3j3eiJdFFTncsgwhu2MN9_4DSibiuyc_UjHPPcOHOmbSLQxZFtnY4lL-OMIM4G-iDm5gb2k7_K0icO_-eTpSySqhKsFJroGg9KtzU-Rp8mUjeCeY_oGNWE8u1ndsZnP635pJ3hSzsFhEKK85X-L0BpCKTOH3WEATg7c4cEl-VaIyrEbz5ap4GoKCMo9oV2egcfoM2c2N91ZN5IpuXpAlwBoRf0O0zECZfBHQaVOX5RbNYu1cdB69jWVl52ZHl1q2dkx8pILl7dThSan5GHK3cfnP_0fucOiPLLKTH0KXZdY7y1eH666WyUdIsv4SrXvLHzhASeQp7XV_WjtEbVriylge0iOVdbngznKzVxGOJ5xQCnyr3oFZl_GfDnVxMokx-dBNefPAYCWNu3NrNkvJ1emR1KBlTJjX7OIrmQPjSDX5lx8fejzIB3cstLXeTHFVU-ITkQ4ZadevjoV_mMz3SKUU_chyzQVybYdHt498-1gVLmtlb2Qww3bKMPsOK9i3_h2MxvHiV9Sow6mYzZHV9Q-riCbBEDoRbNo0iyHgjbOjs-UHwQPN0U1bvOvU2RxcS7A&isca=1 -->
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<div id="184981f8-654a-4e90-a0f5-e75d1edaf2ca" class="act">
    <div id="b995877a-0d3c-439f-984e-f9f809d124a5" class="footnotes">
        <table>
            <tbody>
            <tr>
                <td id="f29aca16-143d-1fc6-8f6a-d2aa116cde25">1</td>
                <td>Ezechiel HAVRENNE is a lecturer at the University of Luxembourg on Investment Funds. Views expressed
                    in this article reflect some of the author’s experience to date on the subject matter. As the
                    Luxembourg investment fund market continues to develop these views may – and will most likely
                    continue to – evolve in one way or another. This article should in no way be construed as legal,
                    business or structuring advice rendered by the author or any other entity, nor should it be
                    construed as reflecting the views of such entity(ies)
                </td>
            </tr>
            <tr>
                <td id="434b1865-a5ea-1f96-b0fa-09ea9e4fb76a">2</td>
                <td>The Preqin Quarterly Update: Private Debt, Q3 2020, 7 October 2020, page 12; <a
                        id="0e11d32d-c25b-65c1-8266-39da10bb62f3"
                        href="https://www.preqin.com/insights/research/quarterly-updates/preqin-quarterly-update-private-debt-q3-2020"
                        target="_blank" class="tech_external" rel="noopener">https://www.preqin.com/insights/research/quarterly-updates/preqin-quarterly-update-private-debt-q3-2020</a>
                    (accessed 15 March 2021). These figures drastically contrast with those reported by Lipper as of
                    October 2016, whereby “<em>the gross AuM of all funds that invest primarily in loan participations
                        was approximately USD 218 billon</em>� as mentioned in IOSCO’s final report; IOSCO
                    FR03/2017, ib., page 4
                </td>
            </tr>
            <tr>
                <td id="6bf035e5-d434-1eec-a550-58147bed84a0">3</td>
                <td>According to EU recommendation 2003/361, 2 factors determine whether a business is an SME: (i) the
                    number of employees and (ii) either turnover or balance sheet total. A medium-sized company has up
                    to 250 employees, a turnover of up to €50 million or a balance sheet total of up to €43 million.
                    A small-sized company has up to 50 employees &amp; a turnover or balance sheet total of up to €10
                    million. A micro-company has up to 10 employees &amp; a turnover or balance sheet total of up to
                    €2 million
                </td>
            </tr>
            <tr>
                <td id="5028557e-4efe-1066-9fd4-28809a6d0653">4</td>
                <td>For instance, one of the driving forces that has led European jurisdictions to consider permitting
                    funds to originate loans was the adoption of the EU regulation on European long-term investment
                    funds allowing funds the origination of loans under certain conditions. As a result, many
                    jurisdictions in Europe now allow loan originations by funds
                </td>
            </tr>
            <tr>
                <td id="cd0ac4df-9139-1c0a-9dd0-c15cca78845a">5</td>
                <td>See IOSCO’s final report FR03/2017, <em>Findings of the Survey on Loan Funds</em>, February 2017,
                    page 4 <a id="76d9ff09-04f9-61a4-a311-2cfee0e19245"
                              href="https://www.iosco.org/library/pubdocs/pdf/IOSCOPD555.pdf" target="_blank"
                              class="tech_external" rel="noopener">https://www.iosco.org/library/pubdocs/pdf/IOSCOPD555.pdf</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="a0dd548b-cfa4-182c-9472-624a6be46538">6</td>
                <td>See the Glossary of Summaries published on EUR-Lex, <a id="3052c250-b9c1-60f7-b36c-45ab06665101"
                                                                           href="https://eur-lex.europa.eu/summary/glossary/sme.html"
                                                                           target="_blank" class="tech_external"
                                                                           rel="noopener">https://eur-lex.europa.eu/summary/glossary/sme.html</a>
                    (accessed 13 April 2021) as well as the European Commission’s page titled “<em>Access to finance
                        for SMEs</em>�,<a id="b8b721ff-fd48-67aa-aaac-e5b1d0d02b60"
                                            href="https://ec.europa.eu/growth/access-to-finance_en" target="_blank"
                                            class="tech_external" rel="noopener">
                        https://ec.europa.eu/growth/access-to-finance_en</a> (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="d98d8f00-f797-1b37-9540-36713cfdc8a7">7</td>
                <td><em>Ib.</em></td>
            </tr>
            <tr>
                <td id="3868e384-a464-1b26-933a-8ec3a95f86d5">8</td>
                <td>For more information see <a id="dc357707-f043-68ce-a7bc-c9a5d9d86c7d"
                                                href="https://ec.europa.eu/growth/smes/cosme_en" target="_blank"
                                                class="tech_external" rel="noopener">https://ec.europa.eu/growth/smes/cosme_en</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="6766e322-fdf8-16b8-99e4-006e43fdecbd">9</td>
                <td>See the European Commission’s page titled “COSME Financial Instruments�, <a
                        id="62cbd917-994d-6388-b0db-786a5c792685"
                        href="https://ec.europa.eu/growth/access-to-finance/cosme-financial-instruments_en"
                        target="_blank" class="tech_external" rel="noopener">https://ec.europa.eu/growth/access-to-finance/cosme-financial-instruments_en</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="11773190-b10f-1399-b71f-3a5fcfa5a5fc">10</td>
                <td>Even if the eligibility for participation in the COSME LGF programme was extended to Loan
                    Origination funds it does not appear from the EIF’s register published as at 31 January 2021 that
                    any would have made the list. See<a id="cf5536ce-bff2-6220-9ed7-e4011b938b0e"
                                                        href="https://www.eif.org/what_we_do/guarantees/single_eu_debt_instrument/cosme-loan-facility-growth/cosme_lgf_signatures.pdf"
                                                        target="_blank" class="tech_external" rel="noopener">
                        https://www.eif.org/what_we_do/guarantees/single_eu_debt_instrument/cosme-loan-facility-growth/cosme_lgf_signatures.pdf</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="12b455e1-ceff-10b6-ba3d-df5b441fe989">11</td>
                <td>Those associated countries include Iceland, Montenegro, Turkey, the Republic of North Macedonia,
                    Albania, Serbia, Bosnia and Herzegovina, and Kosovo
                </td>
            </tr>
            <tr>
                <td id="d8103a16-44fa-1096-8295-d478456b0117">12</td>
                <td>Connor Hussey, Luxembourg private debt industry grows 36% from 2019, Private Funds CFO, 3 December
                    2020, <a id="0facc75b-6776-606c-b47d-e2025d559bf2"
                             href="https://www.privatefundscfo.com/luxembourg-private-debt-industry-grows-36-2-from-2019"
                             target="_blank" class="tech_external" rel="noopener">https://www.privatefundscfo.com/luxembourg-private-debt-industry-grows-36-2-from-2019</a>/
                    (accessed 13 April 2021). These figures should be in line with the then reality based on the 2017
                    final report of IOSCO whereby it stated that “<em>in Luxembourg, the net AuM of all domestic Loan
                        Funds (i.e., Funds with their primary activity engaged in lending and across various loan
                        activities, encompassing also activities such as microfinance, real estate debt or
                        infrastructure financing) is EUR 37.3 bn, constituting 1% of all domestic Funds</em>�, IOSCO
                    FR03/2017, ib., page 9
                </td>
            </tr>
            <tr>
                <td id="228c3276-de18-1393-9860-66ff5272b741">13</td>
                <td>KPMG – ALFI Private Debt Fund Survey 2020, pages 4 and 5, <br><a
                        id="6d4a0dff-557a-603a-8b28-c47bd843b6b4"
                        href="https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf?</a>utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=Loan%20Note%203%20December%202020&amp;utm_term=PDI_LONENOTE_SUBSCRIBER<br>
                </td>
            </tr>
            </tbody>
        </table>
    </div>
</div>
</body>
</html>

But content gets cropped everytime when I run HtmlConverter.convertToPdf() with the html content as a string getting this as result:

enter image description here

However when I remove last tr element, I get expected result:

enter image description here

What do you think is causing this? Is it because table element has too many childs?

--- Question Update ----

So after reading the comment from @CptCave I tried changing the html to this format using word-break css property that's supposed to work in this case:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <style>
        .word-break{
            word-break: break-all;
        }
    </style>
</head>
<body>
<div id="b995877a-0d3c-439f-984e-f9f809d124a5" class="footnotes">
    <table class="word-break">
        <tbody>
        <tr>
            <td id="7673aebd-bc37-198d-932f-987fb16fb503">94</td>
            <td>See ESMA Consultation Paper Guidelines on transaction reporting, reference data, order record
                keeping &amp; clock synchronisation, 23 December 2015, ESMA/2015/1909, p. 49; <a
                        id="5326eab7-02a4-69ec-9069-2d0c8eb5f180"
                        href="https://www.esma.europa.eu/sites/default/files/library/2015-1909_guidelines_on_transaction_reporting_reference_data_order_record_keeping_and_clock_synchronisation.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://www.esma.europa.eu/sites/default/files/library/2015-1909_guidelines_on_transaction_reporting_reference_data_order_record_keeping_and_clock_synchronisation.pdf</a>
                (accessed on 13 April 2021)
            </td>
        </tr>
        </tbody>
    </table>
</div>
</body>
</html>

However I got this as result:

enter image description here

The solution was to add inline css

*<table style="word-wrap: break-word"/>*

So to accomplish I changed document structure with jsoup before converting it:

Document document = Jsoup.parse(html);
document.getElementsByTag("table").forEach(table -> {
   table.attr("style", "word-wrap: break-word");
});

Upvotes: 1

Views: 520

Answers (1)

kHLVT
kHLVT

Reputation: 103

As far as I can see your issue is caused by the lack of word wrapping. Your last table row has a long uninterrupted string: the link with the UTM-tags. If you'd remove the utm-tags from it, the cropping would not persist.

            <tr>
                <td id="228c3276-de18-1393-9860-66ff5272b741">13</td>
                <td>KPMG – ALFI Private Debt Fund Survey 2020, pages 4 and 5, <br><a
                        id="6d4a0dff-557a-603a-8b28-c47bd843b6b4"
                        href="https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf</a><br>
                </td>
            </tr>

The more durable solution is to implement word wrapping with CSS with the parameter overflow-wrap set to break-word.

There is a full example of this in the iText KB: https://kb.itextpdf.com/home/it7kb/examples/pdfhtml-support-for-overflow-wrap-word-break-css-properties

Upvotes: 2

Related Questions