Reputation: 83
I am new here and newbie with python and currently learning some basic stuff, mostly scraping and I encountered a problem that I hope you can help me to solve.
I'm trying to scrape few details from a website and writing them into a CSV file but I'm able to write only the last results into my CSV, apparently my script just overwrite the data.
Also if you find any mistakes on my code or any room for improvement (which I'm sure there are) I'd be glad if you will point them out as well.
Also2, any recommendation for videos/tutorials that can help me improve my python and scraping skills would be appreciated.
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.tamarackgc.com/club-contacts'
source = requests.get(url).text
soup = BeautifulSoup (source, 'lxml')
csv_file = open('contacts.csv', 'w')
csv_writer = csv.writer (csv_file)
csv_writer.writerow(["department", "name", "position", "phone"])
for department in soup.find_all("div", class_="view-content"):
department_name = department.h3
print (department_name.text)
for contacts in soup.find_all("div", class_="col-md-7 col-xs-10"):
contact_name = contacts.strong
print(contact_name.text)
for position in soup.find_all("div", class_="field-content"):
print(position.text)
for phone in soup.find_all("div", class_="modal-content"):
first_phone = phone.h3
first_phones = first_phone
print(first_phones)
csv_writer.writerow([department_name, contact_name, position, first_phones])
csv_file.close()
Upvotes: 0
Views: 192
Reputation: 189
Hi Babr welcome to use python. Your answer is good and here is one more little thing you may can do better.
use find
replace find_all
if you just want one element
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.tamarackgc.com/club-contacts'
source = requests.get(url).text
soup = BeautifulSoup(source, 'lxml')
f = open("/Users/mingjunliu/Downloads/contacts.csv", "w+")
csv_writer = csv.writer(f)
csv_writer.writerow(["Name", "Position"])
for info in soup.find_all("div", class_="well profile"):
contact_name = info.find("div", class_="col-md-7 col-xs-10")
names = contact_name.strong
name = names.text
print(name)
position_name = info.find("div", class_="field-content")
position = position_name.text
print(position)
print("")
csv_writer.writerow([name, position])
f.close()
And the reason you need to drop phone and department is because of the bad website structure. It's not your fault.
Upvotes: 1
Reputation: 83
Thanks Thomas, Actually I tweaked my code a little bit by thinking how I can make it simpler (four for loops are too much, no?) so with the following code I solved my problem(dropped the 'department' and 'phones' because some other issues):
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.tamarackgc.com/club-contacts'
source = requests.get(url).text
soup = BeautifulSoup (source, 'lxml')
f = open("contactslot.csv", "w+")
csv_writer = csv.writer (f)
csv_writer.writerow(["Name", "Position"])
infomation = soup.find_all("div", class_="well profile")
info = information[0]
for info in information:
contact_name = info.find_all("div", class_="col-md-7 col-xs-10")
names = contact_name[0].strong
name = names.text
print (name)
position_name = info.find_all("div", class_="field-content")
position = position_name[0].text
print(position)
print("")
csv_writer.writerow([name, position])
f.close()
Upvotes: 2