Tasos
Tasos

Reputation: 1635

Download multiple csv files from a web directory using Python and store them in disk, using as filename the anchor text

From this URL: http://vs-web-fs-1.oecd.org/piaac/puf-data/CSV

I want to download all the files and save them with the text of the anchor tag. I guess my main struggle is to retrieve the text of the anchor tag right now:

from bs4 import BeautifulSoup
import requests
import urllib.request

url_base = "http://vs-web-fs-1.oecd.org"
url_dir = "http://vs-web-fs-1.oecd.org/piaac/puf-data/CSV"

r  = requests.get(url_dir)
data = r.text
soup = BeautifulSoup(data,features="html5lib")

for link in soup.find_all('a'):
    if link.get('href').endswith(".csv"):
        print(link.find("a"))
        urllib.request.urlretrieve(url_base+link.get('href'), "test.csv")

Line print(link.find("a")) returns None. How can I retrieve the text?

Upvotes: 0

Views: 972

Answers (1)

Lucas Hort
Lucas Hort

Reputation: 854

You get the text accessing the content, like this:

link.contents[0]

or

link.string

Upvotes: 1

Related Questions