Reputation: 45
I'm using the below code to scrape data from a job site and write it to a csv file using BeautifulSoup. I see that the scraping code works because when I print the extracted files, it seems okay. However, I'm not able to print the scraped data into a csv file. A csv file is created but in each column, there are only some letters like a, b, c instead of complete words describing the title, salary, etc. Can anyone help me with this?
import requests
import csv
r=requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
r.content
soup=BeautifulSoup(r.content)
#print(soup.prettify())
soup.find_all("article")
jobs=soup.find_all("article")
for job in jobs:
title=job.h3.text
posterline=job.find("div", attrs={"class":"posted-by"})
poster=posterline.find("a").text
postdate=job.find('div',{'class': 'posted-by'}).next_element
description=job.find("div", attrs={"class":"description"})
metadata=job.find("div", attrs={"class":"metadata"})
metadata=job.find("div", attrs={"class":"metadata"})
salary=metadata.find("li", attrs={"class": "salary"})
salary=salary.text
time=metadata.find("li", attrs={"class": "time"})
datas=(title, salary, time, postdate, poster)
with open('reeddata.csv', 'w', newline='') as file:
writer = csv.writer(file)
headers = ['Title','Salary','Time', 'Postdate','Poster']
writer.writerow(headers)
for data in datas:
writer.writerow(data)
Upvotes: 0
Views: 134
Reputation: 154
Try the script below to fetch the required content and write the same in a csv file accordingly:
import requests
from bs4 import BeautifulSoup
import csv
r = requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
soup = BeautifulSoup(r.content,"html.parser")
with open('reeddata.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Title','Salary','Time', 'Postdate','Poster'])
for job in soup.find_all("article"):
title = job.find("h3",class_="title").find("a",href=True).get_text(strip=True)
poster = job.find("div", class_="posted-by").find("a").get_text(strip=True)
postdate = job.find('div',class_='posted-by').next_element.strip()
salary = job.find("div",class_="metadata").find("li",class_="salary").get_text(strip=True)
time = job.find("div",class_="metadata").find("li",class_="time").get_text(strip=True)
writer.writerow([title, salary, time, postdate, poster])
Upvotes: 1
Reputation: 144
It's an indentation issue, here for every job you find, you open a csv write the description of the job and close it, then for the next job you overwrite the csv. try to unindent the writing loop and append your values to "datas" instead of redefining it for every job.
Upvotes: 0