Box-Of-Hats
Box-Of-Hats

Reputation: 33

Using Urllib in Python3 to download a file, giving HTTP error 403- faking a user agent?

I'm using phantomJS and selenium to convert Youtube videos to mp3s using anything2mp3.com and then attempting to download the files.

I'm trying to use urllib in Python 3 to download a .mp3 file. However, when I try:

url = 'example.com'
fileName = 'testFile.mp3'
urllib.request.urlretrieve(url, fileName)

I get the error:

urllib.error.HTTPError: HTTP Error 403: Forbidden

From hours of searching, I have found that it is likely due to the website not liking the user agent being used to access the website. I've tried to alter the user agent but haven't had any luck since I can't simply supply a header to urlretrieve.

Upvotes: 3

Views: 1449

Answers (1)

drets
drets

Reputation: 2795

Use requests lib:

SERVICE_URL = 'http://anything2mp3.com/'
YOUTUBE_URL = 'https://youtu.be/AqCWi_-vnTg'
FILE_NAME = 'song.mp3'

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'

# Get mp3 link using selenium

browser = webdriver.PhantomJS()
browser.get(SERVICE_URL)
search = browser.find_element_by_css_selector('#edit-url')
search.send_keys(YOUTUBE_URL)
submit = browser.find_element_by_css_selector('#edit-submit--2')
submit.click()
a = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#block-system-main > a')))
download_link = a.get_attribute('href')

# Download file using requests
# http://docs.python-requests.org/en/latest/

r = requests.get(download_link, stream=True, headers={'User-Agent': USER_AGENT})
with open(FILE_NAME, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)

Upvotes: 1

Related Questions