Palle Broe
Palle Broe

Reputation: 99

BeautifulSoup output is not transferred to CSV file

I'm trying to export the output from a webscraper to a CSV file. The code works and I get the right output when I run it in the terminal, but it does not transfer to the CSV file.

Question

When I remove the first for loop it works fine, but I can't figure out exactly what the error it in this part?

Code

import csv ; import requests
from bs4 import BeautifulSoup

outfile = open('ImplementTest8.csv','w')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for li in soup.find('ul', class_='list-articles list').find_all('li'):
    level = li.find_all('dd', {'class': 'author'})[1].get_text()
    if "Graduate" in level:
        links = li.find_all("href")
        for link in links:
            if "career" in link.get("href") and 'COPENHAGEN' in link.text:
                item_link = link.get("href").strip()
                item_text = link.text.replace("View Position","").encode('utf-8').strip()
                writer.writerow([item_link, item_text])
                print(item_link, item_text)

Edited Code

import csv ; import requests
from bs4 import BeautifulSoup

outfile = open('ImplementTest8.csv','w')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for li in soup.find('ul', class_='list-articles list').find_all('li'):
    level = li.find_all('dd', {'class': 'author'})[1].get_text()
    if "Graduate" in level:
        links = li.find_all(href=True)
        for link in links:
            if "career" in link.get("href") and 'COPENHAGEN' in link.text:
                item_link = link.get("href").strip()
                item_text = link.text.replace("View Position","").encode('utf-8').strip()
                writer.writerow([item_link, item_text])
                print(item_link, item_text)

Upvotes: 0

Views: 145

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

Href is a tag attribute not a tag name. if you want to ensure that all your links have a href attribute you can use it as a keyward argument, else use the tag name.

links = li.find_all(href=True)

Upvotes: 2

Related Questions