Saeed Heidarbozorg
Saeed Heidarbozorg

Reputation: 158

Create pdf from persian html file by ITextSharp

I use ITextSharp library to convert html to pdf. My users use persian language sentence in her/his html files, So this library can't convert persian word.

For resolve this and right to left problem i use bellow code:

        Document document = new Document(PageSize.A4, 80, 50, 30, 65);
        PdfWriter.GetInstance(document, new FileStream(strPDFpath, FileMode.Create));
        document.Open();

        ArrayList objects;
        document.NewPage();

        var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();
        objects = iTextSharp.text.html.simpleparser.
        HTMLWorker.ParseToList(new StreamReader(strHTMLpath, Encoding.UTF8), styles);            

        BaseFont bf = BaseFont.CreateFont("c:\\windows\\fonts\\Tahoma.ttf",
                                        BaseFont.IDENTITY_H, true);
        for (int k = 0; k < objects.Count; k++)
        {
            PdfPTable table = new PdfPTable(1);
            table.RunDirection = PdfWriter.RUN_DIRECTION_RTL;

            var els = (IElement)objects[k];
            foreach (Chunk el in els.Chunks)
            {
                #region set persian font
               iTextSharp.text.Font f2 = new iTextSharp.text.Font(bf, el.Font.Size,
                                                el.Font.Style, el.Font.Color);
                el.Font = f2;
                #endregion set persian font

                #region Set right to left for persian words
                PdfPCell cell = new PdfPCell(new Phrase(10, el.Content, el.Font));
                cell.BorderWidth = 0;
                table.AddCell(cell);
                #endregion Set right to left for persian words
            }
            //document.Add((IElement)objects[k]);                
            document.Add(table);
        }

        document.Close();
        Response.Write(strPDFpath);
        Response.ClearContent();
        Response.ClearHeaders();
        Response.AddHeader("Content-Disposition", "attachment; filename=" + strPDFpath);
        Response.ContentType = "application/octet-stream";
        Response.WriteFile(strPDFpath);
        Response.Flush();
        Response.Close();
        if (File.Exists(strPDFpath))
        {
            File.Delete(strPDFpath);
        }

My right to left and convert persian words was resolved, but it have another problem.

My algorithm can't parse and convert content of table tag that uses in html file.

Now the question is: How to parse html file that have table tag, div and paragraph tag with persian language sentence, and convert it to pdf?

Upvotes: 6

Views: 6566

Answers (2)

VahidN
VahidN

Reputation: 19156

iTextSharp is able to parse table tags too. but it does not set its RTL properties and you need to fix it yourself:

            foreach (var htmlElement in parsedHtmlElements)
            {
                fixRunDirection(htmlElement);
                pdfCell.AddElement(htmlElement);
            }

...

        private static void fixRunDirection(IElement htmlElement)
        {
            if (!(htmlElement is PdfPTable)) return;

            var table = (PdfPTable)htmlElement;
            table.RunDirection = PdfWriter.RUN_DIRECTION_RTL;

            foreach (var row in table.Rows)
            {
                foreach (var cell in row.GetCells())
                {
                    cell.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
                    foreach (var element in cell.CompositeElements)
                    {
                        fixRunDirection(element);
                    }
                }
            }
        }

More info: (^)

Upvotes: 4

Sleeper Smith
Sleeper Smith

Reputation: 3242

Try using this http://code.google.com/p/wkhtmltopdf/

That application reads in a html page and saves it as a pdf. Just run that thing in C# using shell script.

Upvotes: 1

Related Questions