Reputation: 31
I am trying to download data from the URL. I created the docker image called download-files that downloads the CSV files. When I run the docker command I can download the files however, if I try to incorporate that docker image inside the cwltool and then execute it I am getting an error and the message that the Internet Connection is not available.
This is my cwlfile that I have executed.
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: download-files
inputs:
url:
type: string
inputBinding:
position: 1
outputs:
output_csv:
type: File
outputBinding:
glob: "*.csv"
This is my docker file that I have created to download the files
FROM python:3.9-slim
WORKDIR /app
COPY ExtractData.py .
RUN pip install requests
ENTRYPOINT ["python", "/app/ExtractData.py"]
This is the python script
import requests
import sys
def download_file(url):
response = requests.get(url)
if response.status_code == 200:
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(response.content)
print(f"File downloaded: {filename}")
else:
print(f"Failed to download file from {url}")
def check_internet_connection():
try:
response = requests.get("https://www.google.com", timeout=3)
response.raise_for_status()
return True
except (requests.exceptions.RequestException, requests.exceptions.Timeout) as e:
return False
if __name__ == "__main__":
print("arguments passed to the script are :",sys.argv)
if len(sys.argv) != 2:
print("Usage: python download_files.py <URL>")
sys.exit(1)
url = sys.argv[1]
if check_internet_connection():
print("Internet connection is available.")
else:
print("Internet connection is not available.")
download_file(url)
The command that I used to execute the cwl is cwltool dataset_extraction.cwl --url https://data.cdc.gov/resource/bugr-bbfr.csv # This is not working
The command that I used to execute the docker is docker run download-files https://data.cdc.gov/resource/bugr-bbfr.csv # This is working
Upvotes: 0
Views: 26