SD4
SD4

Reputation: 479

Download file from a URL in Azure Pipelines Microsoft hosted agent

I'm running a Python script task in Azure YAML pipeline. A JSON file gets downloaded when the URL is accessed via a browser. URL - https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519

What I've done so far -->

- task: PythonScript@0
  name: pythonTask
  inputs:
    scriptSource: 'inline'
    script: |

      url = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"

      import webbrowser
      webbrowser.open(url)
      print("The web browser opened and the file is downloaded")

Once the browser opens URL, a file should get downloaded locally, automatically. However, on running the above pipeline, I cant seem to find the file anywhere on the agent machine. I do not get any errors as well.

I'm using a Windows-2019 Microsoft Hosted Agent.

How can I find the downloaded file-path inside the agent machine?

Or is there another way I can download file from the URL without actually having to open the browser?

Upvotes: 1

Views: 7862

Answers (1)

Kevin Lu-MSFT
Kevin Lu-MSFT

Reputation: 35194

How can I find the downloaded file-path inside the agent machine?

Please try the following Python script:

steps:
- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import urllib.request
     

     
     url = 'https://www.some_url.com/downloads'
     
     path = r"$(Build.ArtifactStagingDirectory)/filename.xx"
     urllib.request.urlretrieve(url, path)

Or

steps:
- script: 'pip install wget'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import wget
     
     print('Beginning file download with wget module')
     
     url = 'https://www.some_url.com/downloads'
     path = r"$(Build.ArtifactStagingDirectory)"
     wget.download(url, path)

Then the file will be downloaded to the target path in Python script.

Here is a blog about use Python download files from url

Update:

The url: microsoft.com/en-us/download/confirmation.aspx?id=56519 needs to open the web page and the file will automatic downloaded.

So when you use the wget or urllib.request, you will get the 403 error.

You could change to use the site url to manually download the json file.

enter image description here

For example: url: https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json

import urllib.request

url = 'https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json'

path = r"$(Build.ArtifactStagingDirectory)\agent.json"
urllib.request.urlretrieve(url, path)

Update2:

You could use Python script to get the download in the website.

Sample:

steps:
- script: |
   pip install bs4
   
   pip install lxml
  workingDirectory: '$(build.sourcesdirectory)'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     from bs4 import BeautifulSoup
     from urllib.request import Request, urlopen
     import re
     import urllib.request
     
     
     req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
     html_page = urlopen(req).read()
     
     a=""
     soup = BeautifulSoup(html_page, "lxml")
     
     for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
         
          a= link.get('href')
          print(a)
     
     
     
     path = r"$(Build.sourcesdirectory)\agent.json"
     urllib.request.urlretrieve(a, path)

Result:

enter image description here

Update3:

Another method to get the download URL:

steps:
- script: 'pip install requests'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import requests
     import re
     import urllib.request
     
     rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
      
     t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
      
     
     
     a= t.group()
     
     print(a)
     
     path = r"$(Build.sourcesdirectory)\agent.json"
     urllib.request.urlretrieve(a, path)
     

Upvotes: 1

Related Questions