Reputation: 20906
I am in a need of programatically convert an Word-XML file into a RTF file. It has become a requirement, because of some third party libraries. Any API/Library that can do that?
Actually the language is not a problem because I just need to work done. But Java, .NET languages or Python are preferred.
Upvotes: 3
Views: 3406
Reputation: 6581
From java, you could use Docmosis to do conversion and optional populating. It sits over openoffice to perform the format conversions. If you install openoffice and manually load and save a few example documents you'll get a feel for whether the format conversions are good enough for you. If so, you can use Docmosis to drive it from Java.
Upvotes: 0
Reputation: 3282
A Python/linux way:
You need the OpenOffice Uno Bride (On server you could run OO in headless mode). As a result you can convert every OO-readable format to every OO-writeable:
see http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0
Run Example Code
/usr/lib64/openoffice.org/program/soffice.bin -accept=socket,host=localhost,port=8100\;urp -headless
Python Example:
import uno
from os.path import abspath, isfile, splitext
from com.sun.star.beans import PropertyValue
from com.sun.star.task import ErrorCodeIOException
from com.sun.star.connection import NoConnectException
FAMILY_TEXT = "Text"
FAMILY_SPREADSHEET = "Spreadsheet"
FAMILY_PRESENTATION = "Presentation"
FAMILY_DRAWING = "Drawing"
DEFAULT_OPENOFFICE_PORT = 8100
FILTER_MAP = {
"pdf": {
FAMILY_TEXT: "writer_pdf_Export",
FAMILY_SPREADSHEET: "calc_pdf_Export",
FAMILY_PRESENTATION: "impress_pdf_Export",
FAMILY_DRAWING: "draw_pdf_Export"
},
"html": {
FAMILY_TEXT: "HTML (StarWriter)",
FAMILY_SPREADSHEET: "HTML (StarCalc)",
FAMILY_PRESENTATION: "impress_html_Export"
},
"odt": { FAMILY_TEXT: "writer8" },
"doc": { FAMILY_TEXT: "MS Word 97" },
"rtf": { FAMILY_TEXT: "Rich Text Format" },
"txt": { FAMILY_TEXT: "Text" },
"docx": { FAMILY_TEXT: "MS Word 2007 XML" },
"ods": { FAMILY_SPREADSHEET: "calc8" },
"xls": { FAMILY_SPREADSHEET: "MS Excel 97" },
"odp": { FAMILY_PRESENTATION: "impress8" },
"ppt": { FAMILY_PRESENTATION: "MS PowerPoint 97" },
"swf": { FAMILY_PRESENTATION: "impress_flash_Export" }
}
class DocumentConverter:
def __init__(self, port=DEFAULT_OPENOFFICE_PORT):
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
try:
self.context = resolver.resolve("uno:socket,host=localhost,port=%s;urp;StarOffice.ComponentContext" % port)
except NoConnectException:
raise Exception, "failed to connect to OpenOffice.org on port %s" % port
self.desktop = self.context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", self.context)
def convert(self, inputFile, outputFile):
inputUrl = self._toFileUrl(inputFile)
outputUrl = self._toFileUrl(outputFile)
document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, self._toProperties(Hidden=True))
#document.setPropertyValue("DocumentTitle", "saf" ) TODO: Check how this can be set and set doc update mode to FULL_UPDATE
if self._detectFamily(document) == FAMILY_TEXT:
indexes = document.getDocumentIndexes()
for i in range(0, indexes.getCount()):
index = indexes.getByIndex(i)
index.update()
try:
document.refresh()
except AttributeError:
pass
indexes = document.getDocumentIndexes()
for i in range(0, indexes.getCount()):
index = indexes.getByIndex(i)
index.update()
outputExt = self._getFileExt(outputFile)
filterName = self._filterName(document, outputExt)
try:
document.storeToURL(outputUrl, self._toProperties(FilterName=filterName))
finally:
document.close(True)
def _filterName(self, document, outputExt):
family = self._detectFamily(document)
try:
filterByFamily = FILTER_MAP[outputExt]
except KeyError:
raise Exception, "unknown output format: '%s'" % outputExt
try:
return filterByFamily[family]
except KeyError:
raise Exception, "unsupported conversion: from '%s' to '%s'" % (family, outputExt)
def _detectFamily(self, document):
if document.supportsService("com.sun.star.text.GenericTextDocument"):
# NOTE: a GenericTextDocument is either a TextDocument, a WebDocument, or a GlobalDocument
# but this further distinction doesn't seem to matter for conversions
return FAMILY_TEXT
if document.supportsService("com.sun.star.sheet.SpreadsheetDocument"):
return FAMILY_SPREADSHEET
if document.supportsService("com.sun.star.presentation.PresentationDocument"):
return FAMILY_PRESENTATION
if document.supportsService("com.sun.star.drawing.DrawingDocument"):
return FAMILY_DRAWING
raise Exception, "unknown document family: %s" % document
def _getFileExt(self, path):
ext = splitext(path)[1]
if ext is not None:
return ext[1:].lower()
def _toFileUrl(self, path):
return uno.systemPathToFileUrl(abspath(path))
def _toProperties(self, **args):
props = []
for key in args:
prop = PropertyValue()
prop.Name = key
prop.Value = args[key]
props.append(prop)
return tuple(props)
if __name__ == "__main__":
from sys import argv, exit
if len(argv) < 3:
print "USAGE: python %s <input-file> <output-file>" % argv[0]
exit(255)
if not isfile(argv[1]):
print "no such input file: %s" % argv[1]
exit(1)
try:
converter = DocumentConverter()
converter.convert(argv[1], argv[2])
except Exception, exception:
print "ERROR!" + str(exception)
exit(1)
Upvotes: 2
Reputation: 52858
You can use AutoIt to automate opening the XML files in word and doing a save as RTF.
I've used the user defined functions for Word to save RTF files as plain text for conversion and it works good. The syntax is very easy.
http://www.autoitscript.com/autoit3/index.shtml
Upvotes: 0
Reputation: 8837
have a look at Docvert. You'll have to set it up for yourself because the demo only lets you upload open office documents, i believe.
Upvotes: 0
Reputation: 10444
Java
I've used Apache POI in the past to parse Word Documents. It seemed to work pretty well. Then here are some libraries to write to RTF.
.Net
Here's an article about writing to a Word Document in .Net. I'm sure you could use the same library for reading.
Python
Here is an article for Python.
Related Question
Also, here is a related if not duplicate question.
Upvotes: 0