ASH
ASH

Reputation: 20342

How can we run a standard Python script in Azure and save files to a Data Lake?

I have a Python script that runs perfectly fine on my laptop. I am trying to move it to Azure, and run it there. Ideally, I would like to do some basic screen scraping, some basic transformation, and then save the data files in the Data Lake or maybe the Storage Explorer (the lake is probably better). So, I setup 'Create Automation Account' and 'Run As Account'. Now, I am trying to run the code (hit Start button) in an Azure 'Runbook' and I am getting this error message.

Failed
Traceback (most recent call last):  File "C:\Temp\3fgngmon.o45\7e326422-ff39-4a2c-93f9-4afafd46205c", line 2, in <module>    from bs4 import BeautifulSoupModuleNotFoundError: No module named 'bs4'

Here is my sample code.

import requests
from bs4 import BeautifulSoup
from urllib.parse import unquote
import csv
import io

all_links = [
    "/vsoch/hospital-chargemaster/tree/0.0.2/data/ochsner-clinic-foundation",
    "/vsoch/hospital-chargemaster/tree/0.0.2/data/ohio-state-university-hospital",
    "/vsoch/hospital-chargemaster/tree/0.0.2/data/orlando-health",
    "/vsoch/hospital-chargemaster/tree/0.0.2/data/st.-joseph%E2%80%99s-hospital-(tampa)",
]

for item in all_links:
    item = item.replace('tree/', '')
    
    try:
        file_name = unquote(item.split('/')[-1])
        DOWNLOAD_URL = f'https://raw.githubusercontent.com{item}/data-latest.tsv'
        r_tsv = requests.get(DOWNLOAD_URL)
        
        if r_tsv.status_code == 404:
            print(f"Not found - {DOWNLOAD_URL}")
        else:
            print(f"Downloaded - {DOWNLOAD_URL}")
            data = list(csv.reader(io.StringIO(r_tsv.text), delimiter='\t'))
            DOWNLOAD_PATH = fr'C:\Users\ryans\Desktop\hospital_data\{file_name}.csv'
            
            with open(DOWNLOAD_PATH, 'w', newline='') as f_output:
                csv_output = csv.writer(f_output)
                csv_output.writerows(data)
    except Exception as e: 
        print(e)

Somehow, I think I need to do a pip install. Not sure how to do it. Also, I need to change the save path to the Data Lake (or Storage Explorer). Not sure how to do this either. How can I get this up and running?

Upvotes: 0

Views: 1027

Answers (1)

Ken W - Zero Networks
Ken W - Zero Networks

Reputation: 3814

You need to import the beautiful soup package into your automation account.

In your Automation account, select Python packages under Shared Resources. Click + Add a Python package.

enter image description here

Upvotes: 1

Related Questions