Reputation: 322
This manual command is working:
!antiword "test" > "test.docx"
but the following script convert files to empty .docx files:
for file in os.listdir(directory):
subprocess.run(["bash", "-c", "antiword \"$1\" > \"$1\".docx", "_", file])
also it stores the .docx file in the previous directly e-g file is in \a\b this command will store the files to \a
I have tried many different ways including running directly on terminal adn bash loops. ony the manual way works.
Upvotes: 1
Views: 1020
Reputation: 7944
use Apache Tika + parallel + pandoc : ( antiword doesn't work well for all kind of doc )
parallel "java -jar tika-app-3.0.0.jar -T {}|pandoc --to docx > {.}.docx" :::*.doc
https://tika.apache.org/
https://pandoc.org/
Upvotes: 1
Reputation: 168814
Something like this should work (adjust dest_path
etc. accordingly).
import os
import shlex
for filename in os.listdir(directory):
if ".doc" not in filename:
continue
path = os.path.join(directory, filename)
dest_path = os.path.splitext(path)[0] + ".txt"
cmd = "antiword %s > %s" % (shlex.quote(path), shlex.quote(dest_path))
print(cmd)
# If the above seems to print correct commands, add:
# os.system(cmd)
Upvotes: 2