Reputation: 97
I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.
I have tried striprtf but read_rtf is not working.
from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)
But in this code, the error is: cannot import name 'read_rtf'
Please can anyone suggest any way to get strings from .rtf file in python3?
Upvotes: 9
Views: 45268
Reputation: 81
Using rtf_to_text
is enough to convert RTFinto
a string in Python.
Read the content from a RTFfile and then feed it to the rtf_to_text
:
from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
content = infile.read()
text = rtf_to_text(content)
print(text)
Upvotes: 8
Reputation: 175
Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me. Hope it will help those who are hunting for the solution.
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects\10.1\power.rtf'
doc = word.Documents.Open(FileName=path, Encoding='gbk')
for para in doc.paragraphs:
print(para.Range.Text)
doc.Close()
word.Quit()
If you want to store in a single variable, the following code will solve the problem.
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')
#for para in doc.paragraphs:
# print(para.Range.Text)
content = '\n'.join([para.Range.Text for para in doc.paragraphs])
print(content)
doc.Close()
word.Quit()
Upvotes: 1
Reputation: 31
Try using this:
from striprtf.striprtf import rtf_to_text
sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)
Upvotes: 3
Reputation: 1173
Have you tried this?
with open('yourfile.rtf', 'r') as file:
text = file.read()
print(text)
For a super large file, try this:
with open("yourfile.rtf") as infile:
for line in infile:
do_something_with(line)
Upvotes: 8