Reputation: 113
I'm reading a .docx file to extract the tables it has in it and re-write each table to a .txt file. I am able to print the rows and columns in the terminal and create all .txt files for each table, but the files created are empty. Here is what I have:
from docx import Document
document = Document('my_doc.docx')
tables = document.tables #Stores all tables in this variable
c = 1
for table in tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
print(paragraph.text)
with open('table_'+str(c)+'.txt', 'w') as f:
f.write(paragraph.text)
c += 1
If I do f.write(str(paragraph))
instead of f.write(paragraph.text)
, it only writes the location where the tables are stored. What am I doing wrong? How can I save the actual content of the tables to the text files?
Thanks!
Upvotes: 2
Views: 1898
Reputation: 3752
The file you are writing to shouldn't be opened in the middle of the loop. The "w"rite mode of opening clears any previous content. You are only opening one file per table so you should open it at that level.
for table in tables:
with open('table_'+str(c)+'.txt', 'w') as f:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
print(paragraph.text)
f.write(paragraph.text)
c += 1
It is probably possible to put the c += 1
before the with
line, (just start from c=0 not c=1) which would help follow in which loop c is incremented.
Upvotes: 2
Reputation: 421
The problem is that you're opening the file up every single time you go through the for loop like so:
with open('table_'+str(c)+'.txt', 'w') as f:
This opens the file for writing, but it also clears the file of any existing text, which is probably why your files are empty.
To fix this, you can use 'a'
instead of 'w'
, thus appending to the file instead of writing it all over again.
Upvotes: 1