Reputation: 20342
I have a Python script that runs perfectly fine on my laptop. I am trying to move it to Azure, and run it there. Ideally, I would like to do some basic screen scraping, some basic transformation, and then save the data files in the Data Lake or maybe the Storage Explorer (the lake is probably better). So, I setup 'Create Automation Account' and 'Run As Account'. Now, I am trying to run the code (hit Start button) in an Azure 'Runbook' and I am getting this error message.
Failed
Traceback (most recent call last): File "C:\Temp\3fgngmon.o45\7e326422-ff39-4a2c-93f9-4afafd46205c", line 2, in <module> from bs4 import BeautifulSoupModuleNotFoundError: No module named 'bs4'
Here is my sample code.
import requests
from bs4 import BeautifulSoup
from urllib.parse import unquote
import csv
import io
all_links = [
"/vsoch/hospital-chargemaster/tree/0.0.2/data/ochsner-clinic-foundation",
"/vsoch/hospital-chargemaster/tree/0.0.2/data/ohio-state-university-hospital",
"/vsoch/hospital-chargemaster/tree/0.0.2/data/orlando-health",
"/vsoch/hospital-chargemaster/tree/0.0.2/data/st.-joseph%E2%80%99s-hospital-(tampa)",
]
for item in all_links:
item = item.replace('tree/', '')
try:
file_name = unquote(item.split('/')[-1])
DOWNLOAD_URL = f'https://raw.githubusercontent.com{item}/data-latest.tsv'
r_tsv = requests.get(DOWNLOAD_URL)
if r_tsv.status_code == 404:
print(f"Not found - {DOWNLOAD_URL}")
else:
print(f"Downloaded - {DOWNLOAD_URL}")
data = list(csv.reader(io.StringIO(r_tsv.text), delimiter='\t'))
DOWNLOAD_PATH = fr'C:\Users\ryans\Desktop\hospital_data\{file_name}.csv'
with open(DOWNLOAD_PATH, 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(data)
except Exception as e:
print(e)
Somehow, I think I need to do a pip install. Not sure how to do it. Also, I need to change the save path to the Data Lake (or Storage Explorer). Not sure how to do this either. How can I get this up and running?
Upvotes: 0
Views: 1027
Reputation: 3814
You need to import the beautiful soup package into your automation account.
In your Automation account, select Python packages under Shared Resources. Click + Add a Python package.
Upvotes: 1