Kiper
Kiper

Reputation: 339

docx to list in python

I am trying to read a docx file and to add the text to a list. Now I need the list to contain lines from the docx file.

example:

docx file:

"Hello, my name is blabla,
I am 30 years old.
I have two kids."

result:

['Hello, my name is blabla', 'I am 30 years old', 'I have two kids']

I cant get it to work.

Using the docx2txt module from here: github link

There is only one command of process and it returns all the text from docx file.

Also I would like it to keep the special characters like ":\-\.\,"

Upvotes: 2

Views: 4677

Answers (1)

Dinesh Pundkar
Dinesh Pundkar

Reputation: 4196

docx2txt module reads docx file and converts it in text format.

You need to split above output using splitlines() and store it in list.

Code (Comments inline) :

import docx2txt

text = docx2txt.process("a.docx")

#Prints output after converting
print ("After converting text is ",text)

content = []
for line in text.splitlines():
  #This will ignore empty/blank lines. 
  if line != '':
    #Append to list
    content.append(line)

print (content)

Output:

C:\Users\dinesh_pundkar\Desktop>python c.py
After converting text is
 Hello, my name is blabla.

I am 30 years old.

I have two kids.

 List is  ['Hello, my name is blabla.', 'I am 30 years old. ', 'I have two kids.']

C:\Users\dinesh_pundkar\Desktop>

Upvotes: 8

Related Questions