Roshan
Roshan

Reputation: 2059

Using flying saucer convert html to Image

Using flying saucer, i successfully convert html to image using below code

//doc - html source code as org.w3c.dom.Document

Java2DRenderer renderer = new Java2DRenderer(doc, width, height); 

         BufferedImage img = renderer.getImage();

        ByteArrayOutputStream os = new ByteArrayOutputStream();
        ImageIO.write(img, "jpg", os);

But i have problems in the above code like it does not render the font properly in the html.

Also if the chinese ,Japanese or other than Ascii characters given, the image has not been rendered with proper content(characters are boxed like below).

enter image description here

But actual html content is

<div ><ul><li><dl><dt><a href="http://jcs2014.com/ja/about/">イベントについて</a><br></dt><dd><ul><li><a href="http://jcs2014.com/ja/about/support.html">サポーター&amp;フレンズ</a><br></li></ul></dd></dl><dl><dt><a href="http://jcs2014.com/ja/event/">イベント・セミナー一覧</a><br></dt></dl></li></ul><div><br></div></div>

Also in my case, any language will come, but all encoded using unicode. How to solve this.

Please help.

Upvotes: 2

Views: 2640

Answers (1)

Roshan
Roshan

Reputation: 2059

    String html = "<div ><ul><li><dl><dt><a href=\"http://jcs2014.com/ja/about/\">イベントについて</a><br></dt><dd><ul><li><a href=\"http://jcs2014.com/ja/about/support.html\">サポーター&amp;フレンズ</a><br></li></ul></dd></dl><dl><dt><a href=\"http://jcs2014.com/ja/event/\">イベント・セミナー一覧</a><br></dt></dl></li></ul><div><br></div></div>"

    //Read it using Utf-8 - Based on encoding, change the encoding name if you know it

    InputStream htmlStream = new ByteArrayInputStream(html.getBytes("UTF-8"));  
    Tidy tidy = new Tidy();      
    org.w3c.dom.Document doc = tidy.parseDOM(new InputStreamReader(htmlStream,"UTF-8"), null);

    Java2DRenderer renderer = new Java2DRenderer(doc, width, height); 
    BufferedImage img = renderer.getImage();
    ByteArrayOutputStream os = new ByteArrayOutputStream();
    ImageIO.write(img, "jpg", os);

This solves my issue. On reading html stream using UTF-8 solves the issue.

Upvotes: 1

Related Questions