Reputation: 24713
Want to do it via C#, all inline, no Process.Start()...and free...could be RTF, HTML, whatever the case may be...as long as I can open in Word, which I can then save off as RTF, which I can then load within a RichTextBox.
I'm aware similar questions have flooded this forum over the years, nothing that seems to address what I am asking though.
EDIT:
Looks like it can be done here: http://www.itextpdf.com/examples/iia.php?id=275
Upvotes: 2
Views: 3111
Reputation: 498914
Use a PDF library, such as iTextSharp to parse the PDF. You will be able to access all text and images from the PDF and convert to whatever representation you want.
There are other solutions (such as installing xpdf and shelling to it - it will convert to html if the right command line arguments are passed in).
Upvotes: 3
Reputation: 5222
I am not sure if Word could open a pdf unless you created the pdf in a word document.
I think the only quick solution to that would be to purchase or find a 3rd party library that does PDF handling, then use it's API to pull out the text you need. The text any any case would be extremely badly formatted at that point i am sure. Also be aware that some pdfs that show text actually have it saved as an image, so there would be no way to get the data out.
Upvotes: 0