Nitiraj
Nitiraj

Reputation: 644

Not able to download PDF that loads in browser using chromedp golang

I am using chromedp to download PDF files. I am able to work with complicated pages where pdf loads inside iframe (using code similar to download_file example). I am doing this by detecting the iframe first, then loading the iframe separately, then detecting the pdfViewer then clicking on the #download button. Example working URL https://www.sebi.gov.in/filings/public-issues/sep-2021/tamilnad-mercantile-bank-limited_52434.html

But I am NOT able to download for simple cases below. Here I have PDF link directly. The code in download_file example just loads the document and does not trigger the download. So I tried below code to directly download as in download_image example. When I hit these url in google-chrome they work fine, I guess it gets loaded in some default PDF extension of google chrome. I have tried multiple chromedp and cdproto versions, two of them are below

    github.com/chromedp/cdproto v0.0.0-20240721024200-dac8efcb39ce
    github.com/chromedp/chromedp v0.9.5

And

    github.com/chromedp/cdproto v0.0.0-20240801214329-3f85d328b335
    github.com/chromedp/chromedp v0.10.0

I also tried Printing the PDF using example similar to this example, but it gives blank pdf.

  1. Example URL : https://www.bseindia.com/bseplus/AnnualReport/543258/74183543258.pdf This one is actually downloading some html file and not the original PDF. I checked this by doing cat on download.pdf
cat download.pdf 
<!doctype html><html><body style='height: 100%; width: 100%; overflow: hidden; margin:0px; background-color: rgb(38, 38, 38);'><embed name='91302F098E174F9DE7C97CF2F96C4F5E' style='position:absolute; left: 0; top: 0;'width='100%' height='100%' src='about:blank' type='application/pdf' internalid='91302F098E174F9DE7C97CF2F96C4F5E'></body></html>% 

I am able to correctly download this with curl.

  1. Example URL : https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf Error : page load error net::ERR_HTTP2_PROTOCOL_ERROR

For this I am not even able to download this with curl and that also give same error. I have asked stackoverflow question for the same here.

Code

func main() {
    url1 := "https://www.bseindia.com/bseplus/AnnualReport/543258/74183543258.pdf"
    //url2 := "https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf"
    Chromepd_download(url1)
}
func Chromepd_download(urlstr string) {
    ctx, cancel := chromedp.NewContext(
        context.Background(),
        chromedp.WithLogf(log.Printf),
        chromedp.WithDebugf(log.Printf),
    )
    defer cancel()

    // create a timeout as a safety net to prevent any infinite wait loops
    ctx, cancel = context.WithTimeout(ctx, 60*time.Second)
    defer cancel()

    // set up a channel, so we can block later while we monitor the download
    // progress
    done := make(chan bool)

    var requestID network.RequestID

    chromedp.ListenTarget(ctx, func(v interface{}) {
        switch ev := v.(type) {
        case *network.EventRequestWillBeSent:
            log.Printf("EventRequestWillBeSent: %v: %v", ev.RequestID, ev.Request.URL)
            if ev.Request.URL == urlstr {
                requestID = ev.RequestID
            }
        case *network.EventLoadingFinished:
            log.Printf("EventLoadingFinished: %v", ev.RequestID)
            if ev.RequestID == requestID {
                close(done)
            }
        }
    })

    // all we need to do here is navigate to the download url
    if err := chromedp.Run(ctx,
        chromedp.Navigate(urlstr),
    ); err != nil {
        log.Fatal(err)
    }

    // This will block until the chromedp listener closes the channel
    <-done
    // get the downloaded bytes for the request id
    var buf []byte
    if err := chromedp.Run(ctx, chromedp.ActionFunc(func(ctx context.Context) error {
        var err error
        buf, err = network.GetResponseBody(requestID).Do(ctx)
        return err
    })); err != nil {
        log.Fatal(err)
    }

    // write the file to disk - since we hold the bytes we dictate the name and
    // location
    if err := os.WriteFile("download.pdf", buf, 0644); err != nil {
        log.Fatal(err)
    }
    log.Print("wrote download.pdf")
}

Upvotes: 0

Views: 113

Answers (0)

Related Questions