Kartal Tibet
Kartal Tibet

Reputation: 45

Writing scraped data to a csv file

I'm using the below code to scrape data from a job site and write it to a csv file using BeautifulSoup. I see that the scraping code works because when I print the extracted files, it seems okay. However, I'm not able to print the scraped data into a csv file. A csv file is created but in each column, there are only some letters like a, b, c instead of complete words describing the title, salary, etc. Can anyone help me with this?

import requests
import csv
r=requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
r.content
soup=BeautifulSoup(r.content)
#print(soup.prettify())
soup.find_all("article")
jobs=soup.find_all("article")

for job in jobs:
    title=job.h3.text
    posterline=job.find("div", attrs={"class":"posted-by"})
    poster=posterline.find("a").text
    postdate=job.find('div',{'class': 'posted-by'}).next_element
    description=job.find("div", attrs={"class":"description"})
    metadata=job.find("div", attrs={"class":"metadata"})
    metadata=job.find("div", attrs={"class":"metadata"})
    salary=metadata.find("li", attrs={"class": "salary"}) 
    salary=salary.text
    time=metadata.find("li", attrs={"class": "time"})

    datas=(title, salary, time, postdate, poster)
    with open('reeddata.csv', 'w', newline='') as file:
        writer = csv.writer(file)
        headers = ['Title','Salary','Time', 'Postdate','Poster']
        writer.writerow(headers)
        for data in datas:
            writer.writerow(data)

Upvotes: 0

Views: 134

Answers (2)

MITHU
MITHU

Reputation: 154

Try the script below to fetch the required content and write the same in a csv file accordingly:

import requests
from bs4 import BeautifulSoup
import csv

r = requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
soup = BeautifulSoup(r.content,"html.parser")

with open('reeddata.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Title','Salary','Time', 'Postdate','Poster'])
    
    for job in soup.find_all("article"):
        title = job.find("h3",class_="title").find("a",href=True).get_text(strip=True)
        poster = job.find("div", class_="posted-by").find("a").get_text(strip=True)
        postdate = job.find('div',class_='posted-by').next_element.strip() 
        salary = job.find("div",class_="metadata").find("li",class_="salary").get_text(strip=True) 
        time = job.find("div",class_="metadata").find("li",class_="time").get_text(strip=True)
        writer.writerow([title, salary, time, postdate, poster])

Upvotes: 1

thesylio
thesylio

Reputation: 144

It's an indentation issue, here for every job you find, you open a csv write the description of the job and close it, then for the next job you overwrite the csv. try to unindent the writing loop and append your values to "datas" instead of redefining it for every job.

Upvotes: 0

Related Questions