user2740323
user2740323

Reputation: 33

convert pdf file into excel sheet

Im new to .net,i have a pdf which contains three tables,(with the purchase details)my task is to extract all the 3 tables from the pdf and convert each into an excel sheet(three excel sheets)using c# code.,i google'd for 3days,all i could find was code to extract the text from pdf(but without any formatting),i cant purchase any third party tools,i need a way to atleast extract the text in proper table formats,then i will convert it to excel using interop,OR a code to directly convert to excel,whatever the solution is i need it urgently,pls help.

Upvotes: 1

Views: 4662

Answers (2)

Massimo Fuccillo
Massimo Fuccillo

Reputation: 337

I suggest you to look at xpdf. It has a command line interface and you can obtain a text file from your pdf. Most important, in case of columns, xpdf produces a well spaced text file so you can easily read your data using Substring() or, in the worst case, with regular expressions. In the simplest case you can import directly the pdf output into Excel as text file with "fixed width fields".

Upvotes: 1

Sage
Sage

Reputation: 15408

itextpdf has support for c# to extract information from pdf, however to answer whither we can extract table:

As described above: you can't get fields from a PDF that looks like a form, if the PDF isn't a form from a technical point of view; you can't get a table from a PDF that looks like a table, if the tabular structure (using tags) is missing inside the PDF.

Which i got from their support panel

Upvotes: 2

Related Questions