ThePumpkinMaster
ThePumpkinMaster

Reputation: 2351

pdf2json gives me a blank output txt file?

I am following their "Code Example" guide on their github. https://github.com/modesty/pdf2json#code-example

In the example that says "Parse a PDF then write a .txt file (which only contains textual content of the PDF)", I copied and pasted the exact implementation into my a local JavaScript file and called it but the output text file was completely blank.

'use strict';

let fs = require('fs');
let PDFParser = require("pdf2json");

let pdfParser = new PDFParser();

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) );
pdfParser.on("pdfParser_dataReady", pdfData => {
    fs.writeFile("./node_modules/pdf2json/test/F1040EZ.content.txt", pdfParser.getRawTextContent());
});

pdfParser.loadPDF("./node_modules/pdf2json/test/pdf/fd/form/F1040EZ.pdf");

Is it something that I am doing wrong? Or does this not work on their part? Also are there any alternatives to pdf to text converters for Nodejs without additional binaries installed?

Upvotes: 7

Views: 5421

Answers (1)

xdvarpunen
xdvarpunen

Reputation: 356

The frontpage documentation is a bit wrong! In order to make this work simply set to PDFParser parameters null and 1

This one works:

var fs = require("fs");

// https://github.com/modesty/pdf2json
var PDFParser = require("./node_modules/pdf2json/PDFParser");
var pdfParser = new PDFParser(this,1);

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));
pdfParser.on("pdfParser_dataReady", pdfData => {
    console.log(pdfParser)
    fs.writeFile("./content.txt", pdfParser.getRawTextContent());
});

HTH -XDVarpunen

Link to issue in pdf2json: https://github.com/modesty/pdf2json/issues/76

Upvotes: 14

Related Questions