Reputation: 33
Im new to .net,i have a pdf which contains three tables,(with the purchase details)my task is to extract all the 3 tables from the pdf and convert each into an excel sheet(three excel sheets)using c# code.,i google'd for 3days,all i could find was code to extract the text from pdf(but without any formatting),i cant purchase any third party tools,i need a way to atleast extract the text in proper table formats,then i will convert it to excel using interop,OR a code to directly convert to excel,whatever the solution is i need it urgently,pls help.
Upvotes: 1
Views: 4662
Reputation: 337
I suggest you to look at xpdf. It has a command line interface and you can obtain a text file from your pdf. Most important, in case of columns, xpdf produces a well spaced text file so you can easily read your data using Substring() or, in the worst case, with regular expressions. In the simplest case you can import directly the pdf output into Excel as text file with "fixed width fields".
Upvotes: 1
Reputation: 15408
itextpdf has support for c# to extract information from pdf, however to answer whither we can extract table:
As described above: you can't get fields from a PDF that looks like a form, if the PDF isn't a form from a technical point of view; you can't get a table from a PDF that looks like a table, if the tabular structure (using tags) is missing inside the PDF.
Which i got from their support panel
Upvotes: 2