RajAt SiNha
RajAt SiNha

Reputation: 97

How to read .rtf file and convert into python3 strings and can be stored in python3 list?

I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.

I have tried striprtf but read_rtf is not working.

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

But in this code, the error is: cannot import name 'read_rtf'

Please can anyone suggest any way to get strings from .rtf file in python3?

Upvotes: 9

Views: 45268

Answers (4)

user17725480
user17725480

Reputation: 81

Using rtf_to_text is enough to convert RTFinto a string in Python. Read the content from a RTFfile and then feed it to the rtf_to_text:

from striprtf.striprtf import rtf_to_text

with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)

Upvotes: 8

Buddhadeb Mondal
Buddhadeb Mondal

Reputation: 175

Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me. Hope it will help those who are hunting for the solution.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

If you want to store in a single variable, the following code will solve the problem.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()

Upvotes: 1

Daniel Howard
Daniel Howard

Reputation: 31

Try using this:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)

Upvotes: 3

Binh
Binh

Reputation: 1173

Have you tried this?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

For a super large file, try this:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)

Upvotes: 8

Related Questions