Reputation: 4190
I'm having trouble converting a pdf to html using Aspose.Pdf-Cloud v1.0.9.
Code:
public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
{
var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);
try
{
var apiResponse = pdfApi.PutConvertDocument("html", null,
Path.GetFileNameWithoutExtension(fileName) + ".html", doc);
if (apiResponse != null && apiResponse.Status.Equals("Ok"))
{
return apiResponse.ResponseStream;
}
throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
}
catch (Exception ex)
{
NLogger.LogError("ConvertPdfToHtml - " + ex);
throw;
}
}
It seems that regardless of what I upload (Adobe, selectPdf) I get a 400 bad request back. Anybody have any luck getting this to work?
Aspose.Words has worked great for me for doc / docx to html so far.
Update : After logging into the account it looks like there's an error being generated behind the scenes:
Error: The method or operation is not implemented.. Method: Convert document to the format specified on-line.. Parameters: format 'html',url '',outPath 'testadobe.html'
This might be a aspose sdk issue, I'll try contacting them as the method is exposed on the sdk and does exactly what I need it to with docs, just need it to work with pdfs too.
Updated code:
public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
{
var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);
var storageApi = new StorageApi(ConfigurationManager.AppSettings["AsposeKey"],
ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);
try
{
storageApi.PutCreate(fileName, "", "", doc);
var apiResponse = pdfApi.GetDocumentWithFormat(fileName, "html", "", "", Path.GetFileNameWithoutExtension(fileName) + ".html");
if (apiResponse != null && apiResponse.Status.Equals("Ok"))
{
var storageRes = storageApi.GetDownload(Path.GetFileNameWithoutExtension(fileName) + ".html", null, "");
var htmlDoc = ZipExtractor.ExtractHtmlFromZip(storageRes.ResponseStream,
Path.GetFileNameWithoutExtension(fileName) + ".html");
return htmlDoc;
}
throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
}
catch (Exception ex)
{
NLogger.LogError("ConvertPdfToHtml - " + ex);
throw;
}
}
Unzip function for posterity:
public static byte[] ExtractHtmlFromZip(byte[] zipBytes, string fileName)
{
var zipStream = new MemoryStream(zipBytes);
if(zipStream == null) throw new NullReferenceException("zipStream doesn't contain any bytes...");
var archive = new ZipArchive(zipStream);
foreach (var zipEntry in archive.Entries)
{
if (zipEntry.FullName == fileName)
{
var fileStream = zipEntry.Open();
using (var ms = new MemoryStream())
{
fileStream.CopyTo(ms);
var bytes = ms.ToArray();
return bytes;
}
}
throw new FileNotFoundException("Couldn't find " + fileName + " in zip archive...");
}
throw new Exception("Oops... looks like this should've never been reached in ExtractHtmlFromZip");
}
Upvotes: 0
Views: 1334
Reputation: 325
We have two APIs to convert PDF document to HTML.
I recommend you to use the first one. The following cURL example will help you to understand an API.
curl -v "http://api.aspose.cloud/v1.1/pdf/Sample.pdf?format=html&appSID=B01A15E5-1B83-4B9A-8EB3-0F2BFA6AC766&signature=hHUw2HKmLY6tQFEevDg52uOLKak" \
-X GET \
-H "Content-Type: application/json" \
-H "Accept: multipart/form-data" \
-o Sample_out.zip
As you may have observed, I set the output (-o) file extension to .zip, instead of .html, the reason is that the converted file contains multiple files (.html, .css, image files), so API zipped the output files.
This cURL example used Sample.pdf as a resource file.
P.S. I work with Aspose as Developer evangelist.
Upvotes: 1