Apache Poi - how to remove all the links from Word Documents

Question

I want to remove all the hyperlinks of a Word document and keep the text. I have these two methods to read word documents with doc and docx extensions.

private void readDocXExtensionDocument(){
    File inputFile = new File(inputFolderDir, "test.docx");
    try {
        XWPFDocument document = new XWPFDocument(OPCPackage.open(new   FileInputStream(inputFile)));
        XWPFWordExtractor extractor = new XWPFWordExtractor(document);
        extractor.setFetchHyperlinks(true);
        String context =  extractor.getText();
        System.out.println(context);
    } catch (InvalidFormatException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

private void readDocExtensionDocument(){
    File inputFile = new File(inputFolderDir, "test.doc");
    POIFSFileSystem fs;
    try {
        fs = new POIFSFileSystem(new FileInputStream(inputFile));
        HWPFDocument document = new HWPFDocument(fs);
        WordExtractor wordExtractor = new WordExtractor(document);
        String[] paragraphs = wordExtractor.getParagraphText();
        System.out.println("Word document has " + paragraphs.length + " paragraphs");
        for(int i=0; i



Is it possible to remove all the links of a word document with using apache poi library? If it is not, are there any other libraries that can provide this?

Apache Poi - how to remove all the links from Word Documents

Answers (1)

Related Questions