Linda
Linda

Reputation: 1233

Transforming html file in java

public String transform_XML(String type, InputStream file){
        TransformerFactory tf = TransformerFactory.newInstance();
        String xslfile = "/StyleSheets/" + type + ".xsl";
        Transformer t = tf.newTemplates(new StreamSource(this.getClass().getResourceAsStream(xslfile))).newTransformer();
        Source source = new StreamSource(file);
        CharArrayWriter wr = new CharArrayWriter();
        StreamResult result = new StreamResult(wr);
        t.transform(source, result);
     return wr.toString();
}

The above method takes an xsl and xml file as input and returns the transformed result as String. Classes from Package javax.xml.transform has been used to accomplish this.

Now can i use the same package to transform an html file? (Since the package name has xml i seriously doubt it.) What should i do to transform an html file?

Upvotes: 1

Views: 376

Answers (3)

Grooveek
Grooveek

Reputation: 10094

As I Understand your comment, it's mainly for scraping ang getting back information

You can have a look at JSoup, which is very handy to parse and scrape a DOM from HTML

Otherwise, If you want to keep your xslts, stemm solution should be fine

Upvotes: 1

stemm
stemm

Reputation: 6050

As you understand, html documents aren't necessary valid xml. But you can transform html to xml, and after that manipulate with valid xml (after transformation - you'll get DOM tree).

I'd suggest you to use CyberNeko HTML Parser to transform html into xml.

Draft example:

import org.cyberneko.html.parsers.DOMParser;
import org.w3c.dom.Document;
...
public Document parseHtml(InputStream is) throws Exception {
    DOMParser parser = new DOMParser();
    parser.parse(new InputSource(is));
    return parser.getDocument();
}

If you use maven - you can simply add to your project CyberNeko from repository http://mvnrepository.com/artifact/nekohtml/nekohtml

Upvotes: 1

Murali N
Murali N

Reputation: 3508

public class SimpleXSLT {
  public static void main(String[] args) {

    String inXML = "C:/tmp/temp.html";
    String inXSL = "C:/tmp/temp.xsl";
    String outTXT = "C:/tmp/temp_copy.html";
    SimpleXSLT st = new SimpleXSLT();
    try {
        st.transform(inXML,inXSL,outTXT);
        } catch(TransformerConfigurationException e) {
        System.err.println("Invalid factory configuration");
        System.err.println(e);
        } catch(TransformerException e) {
        System.err.println("Error during transformation");
        System.err.println(e);
    }
  }

  public void transform(String inXML,String inXSL,String outTXT)
     throws TransformerConfigurationException,
   TransformerException {
     TransformerFactory factory = TransformerFactory.newInstance();
     StreamSource xslStream = new StreamSource(inXSL);
     Transformer transformer = factory.newTransformer(xslStream);
     transformer.setErrorListener(new MyErrorListener());
     StreamSource in = new StreamSource(inXML);
     StreamResult out = new StreamResult(outTXT);
     transformer.transform(in,out);
     System.out.println("The generated XML file is:" + outTXT);
  }
}

Upvotes: 1

Related Questions