xGeo
xGeo

Reputation: 2139

Get number of rows a cell occupies Itextsharp

everyone.

I want to know if there is a method to accurately get the rows occupied by a certain cell.

Currently I'm doing this using the ff. function:

private int GetRowLines(string content, float maxFloatPerRow)
{
    if (string.IsNullOrEmpty(content))
        content = string.Empty;

    float noteFwidth = BaseFont.GetWidthPoint(content, cellFont.Size);

    int nextRowLines = 0;
    var test = noteFwidth / maxFloatPerRow;
    nextRowLines = (int)Math.Ceiling(test);

    return nextRowLines == 0 ? 1 : nextRowLines;
}

The only problem with this is that I need to supply the maxFloatPerRow which is done only by trial and error.

  1. I'll have a pdf generated with lots of "i" in the particular cell I want to test.
  2. Then I will copy all the content on that cell for 1 row (this will be the maximum characters in one row for that cell).
  3. Get the float width of that "max content per row" using the BaseFont.GetWidthPoint method.

However, I want to create a utility method that will give me the number of rows a content will occupy, provided the Fwidth of the header, font of the content, and the content itself. More if needed so.

EDIT based on comment:

I am using itext v.3.1.7.0 and I am creating a pdf, not editing an existing one.

I hope you guys have something to share. Thanks.

Upvotes: 3

Views: 1859

Answers (2)

xGeo
xGeo

Reputation: 2139

After several days of experimenting, here's the workaround I have come so far and this seems to be accurate enough:

    /// <summary>
    /// Gets number of rows this cell occupies.
    /// </summary>
    /// <param name="headerFwidths">The fwidths of the headers of the table this cell belongs to</param>
    /// <param name="index">The column index of the cell to check</param>
    /// <param name="cCell">The cell to check</param>
    /// <returns>int the number of rows</returns>
    public int GetRowLines(float[] headerFwidths, int index, CellValue cCell) {
        float tableWidth = Document.GetRight(Document.LeftMargin);
        float lPad = cCell.PaddingLeft != null ? cCell.PaddingLeft.Value : 2f;
        float rPad = cCell.PaddingRight != null ? cCell.PaddingRight.Value : 2f;
        float maxFloatPerRow = ((tableWidth / headerFwidths.Sum()) * headerFwidths[index]) - (lPad + rPad);
        string content = string.IsNullOrEmpty(cCell.Title) ? string.Empty : cCell.Title;

        int rowLines = 0;
        float cellFontWidth = BaseFont.GetWidthPoint(content, cCell.CellFont.Size);
        rowLines = (int)Math.Ceiling(cellFontWidth / maxFloatPerRow);
        return rowLines == 0 ? 1 : rowLines;
    }

So here's how it works:

First, you get the float[] fWidths of the table headers because the cells basically follows the fwidth of it's header.

Then, you get width of your document using Document.GetRight(Document.LeftMargin).

Next step, is to get the padding for the CellValue to be checked.

Note: CellValue is our custom class which is derived from PdfPCell class of iTextSharp.

So, using the table width, cell paddings and header fwidths, we can estimate the maxFloatPerRow:

float maxFloatPerRow = ((tableWidth / headerFwidths.Sum()) * headerFwidths[index]) - (lPad + rPad);

We can get the float value for the cell, cellFontWidth using BaseFont.GetWidthPoint.

Finally, we divide our cellFontWidth with our maxFloatPerRow to get the number of lines the cell occupies.

This might not be 100% accurate but so far this works for our case.

I hope this helps anyone with the same case as me. (I'm accepting this as the answer. But in case you have a better answer, please feel free to post. I will gladly accept yours as the answer if proven to be better.)

Upvotes: 2

Joris Schellekens
Joris Schellekens

Reputation: 9012

There are several options. I'll describe only the high level points of the two easiest solutions

Approach 1

Use pdf2Data (it's an iText7 add-on), it is able to turn pdf documents into xml data (given a template the document matches). This add-on is only available for iText7 however, so it will require some migration effort.

Approach 2

  1. use an EventListener to gather all line-drawing events from the target page.
  2. once you have all line-rendering information, cluster it, putting lines in the same cluster if and only if they intersect at roughly 90 degree angles
  3. inspect each cluster, a cluster that contains a certain threshold of lines can be considered a table
  4. do a vertical projection of all horizontal lines, this tells you how many rows there are (in total, in the entire table)
  5. do a horizontal projection of all vertical lines, this tells you how many columns there are (in total, in the entire table)
  6. Now that you have the boundaries of each cell, you can repeat step 4 and 5 for every sub-range of coordinates within the table to figure out exactly how many rows/columns there are within that coordinate range.

Upvotes: 3

Related Questions