Reputation: 479
I'm running a Python script task in Azure YAML pipeline. A JSON file gets downloaded when the URL is accessed via a browser. URL - https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
What I've done so far -->
- task: PythonScript@0
name: pythonTask
inputs:
scriptSource: 'inline'
script: |
url = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
import webbrowser
webbrowser.open(url)
print("The web browser opened and the file is downloaded")
Once the browser opens URL, a file should get downloaded locally, automatically. However, on running the above pipeline, I cant seem to find the file anywhere on the agent machine. I do not get any errors as well.
I'm using a Windows-2019 Microsoft Hosted Agent.
How can I find the downloaded file-path inside the agent machine?
Or is there another way I can download file from the URL without actually having to open the browser?
Upvotes: 1
Views: 7862
Reputation: 35194
How can I find the downloaded file-path inside the agent machine?
Please try the following Python script:
steps:
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
import urllib.request
url = 'https://www.some_url.com/downloads'
path = r"$(Build.ArtifactStagingDirectory)/filename.xx"
urllib.request.urlretrieve(url, path)
Or
steps:
- script: 'pip install wget'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
import wget
print('Beginning file download with wget module')
url = 'https://www.some_url.com/downloads'
path = r"$(Build.ArtifactStagingDirectory)"
wget.download(url, path)
Then the file will be downloaded to the target path in Python script.
Here is a blog about use Python download files from url
Update:
The url: microsoft.com/en-us/download/confirmation.aspx?id=56519
needs to open the web page and the file will automatic downloaded.
So when you use the wget or urllib.request, you will get the 403 error.
You could change to use the site url to manually download the json file.
For example: url: https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json
import urllib.request
url = 'https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json'
path = r"$(Build.ArtifactStagingDirectory)\agent.json"
urllib.request.urlretrieve(url, path)
Update2:
You could use Python script to get the download in the website.
Sample:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
Result:
Update3:
Another method to get the download URL:
steps:
- script: 'pip install requests'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
import requests
import re
import urllib.request
rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
a= t.group()
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
Upvotes: 1