Cody
Cody

Reputation: 23

Web.Contents stuck or times out while trying to download a file

I cannot connect to the site on which there is a small xlsx file. The file is the Rig Count Summary on this site. Right-click > copy link to use in Power Query.


let
    Source = Excel.Workbook(Web.Contents("https://rigcount.bakerhughes.com/static-files/3ba17f6e-62be-454c-bbd9-806996a7d991"), null, true)
 
in
    Source


Upvotes: 1

Views: 507

Answers (2)

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89036

The web server gets the HTTP request and can behave differently based on the HTTP headers present in the request. By trial-and-error you can copy request headers from a working request that you examine in browser debug mode or Fiddler and add them to Web.Contents. The result of doing that here is something like:

let
    headers = [
    #"User-Agent"="Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1 Edg/108.0.0.0"
    ,#"Accept-Encoding"="gzip, deflate"
    ],
    Source = Web.Contents("https://rigcount.bakerhughes.com/static-files/3ba17f6e-62be-454c-bbd9-806996a7d991",[Headers=headers]),
    Data = Excel.Workbook(Source),
    RigCountSummary_CurrentWeek_Sheet = Data{[Item="RigCountSummary_CurrentWeek",Kind="Sheet"]}[Data]
in
    RigCountSummary_CurrentWeek_Sheet

Without the Accept-Encoding and User-Agent headers the request was hanging on the web server. This is probably a bug with that web application, instead of an anti-scraping measure, as you wouldn't intentially cause incoming requests to hang for a long time.

Upvotes: 3

Peter
Peter

Reputation: 12315

The problem is that your site doesn't provide a direct link to the Excel file.

  • When you click on the xlsx link, a java script starts the download of Rig Count Summary_121622.xlsx
  • When you copy the xlsx link, you get https://rigcount.bakerhughes.com/static-files/4ef2cc30-b5a4-4b91-856a-499467858baa, which is not an Excel file.

Upvotes: 0

Related Questions