siva narayana
siva narayana

Reputation: 41

how to retrieve particular table data in multiple tables using python-docx?

I am using python-docx to extract particular table data in a word file. I have a word file with multiple tables. This is the particular table in multiple tables and the retrieved data need to be arranged like this.

Challenges:

  1. Can I find a particular table in word file using python-docx
  2. Can I achieve my requirement using python-docx

Upvotes: 1

Views: 3052

Answers (1)

Watty62
Watty62

Reputation: 602

This is not a complete answer, but it should point you in the right direction, and is based on some similar task I have been working on.

I run the following code in Python 3.6 in a Jupyter notebook, but it should work just in Python.

First we start but importing the docx Document module and point to the document we want to work with.

from docx.api import Document

document = Document(<your path to doc>)

We create a list of tables, and print how many tables there are in that. We create a list to hold all the tabular data.

tables = document.tables

print (len(tables))

big_data = []

Next we loop through the tables:

for table in document.tables:

    data = []

    keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)

        if i == 0:
            keys = tuple(text)
            continue
        row_data = dict(zip(keys, text))
        data.append(row_data)
        #print (data)
        big_data.append(data)
print(big_data)

By looping through all the tables, we read the data, creating a list of lists. Each individual list represents a table, and within that we have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.

So, that is half of your problem. The next part would be to use python-docx to create a new table in your output document - and to fill it with the appropriate content from the list / list / dictionary data.

In the example I have been working on this is the final table in the document. final table

When I run the routine above, this is my output:

[{'Version': '1', 'Changes': 'Local Outcome Improvement Plan ', 'Page Number': '1-34 and 42-61', 'Approved By': 'CPA Board\n', 'Date ': '22 August 2016'}, 
{'Version': '2', 'Changes': 'People are resilient, included and supported when in need section added ', 'Page Number': '35-41', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}, 
{'Version': '2', 'Changes': 'Updated governance and accountability structure following approval of the Final Report for the Review of CPA Infrastructure', 'Page Number': '59', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}]]

Upvotes: 2

Related Questions