Reputation: 29
Warning: new to python and programming
Objective: Scrape all job links from this page and place into a txt/csv/json/XML file: https://www.indeed.ca/jobs?q=title%3Aengineer&l=Vancouver%2C+BC
Code:
from selenium import webdriver
import csv
browser = webdriver.Firefox()
browser.get('https://www.indeed.ca/jobs?q=engineer&l=Vancouver%2C+BC&sort=date')
jobs = browser.find_elements_by_partial_link_text('Engineer')
for job in jobs:
print(job.get_attribute("href"))
with open("output.csv",'w') as resultFile:
wr = csv.writer(resultFile)
wr.writerow(jobs)
It works great when it prints the results, but it doesn't store anything in the csv file. Also, I plan to make this scrape more than 1 page, so what would be the best way in modifying the csv file in a way that expands the links, not overwrites them?
Upvotes: 0
Views: 294
Reputation: 19154
it is not writen to csv because the input jobs
in wr.writerow(jobs)
is not valid, you can do
with open("output.csv",'w') as resultFile:
wr = csv.writer(resultFile)
wr.writerow([j.get_attribute("href") for j in jobs])
Upvotes: 1
Reputation: 10470
This is strange looking for jobs in jobs:
. Are you sure you didn't mean to write for job in jobs:
? And that is probably your problem. You are stomping on your jobs
iterable.
Take a look at this example:
>>> numbers = [1,2,3,4]
>>> numbers
[1, 2, 3, 4]
>>> type(numbers)
<type 'list'>
>>> for numbers in numbers:
... print numbers
...
1
2
3
4
>>> numbers
4
>>> type(numbers)
<type 'int'>
It isn't the print numbers
that is turning numbers
into an int
. Observe:
>>> numbers = [1,2,3,4]
>>> type(numbers)
<class 'list'>
>>> for numbers in numbers:
... print(":)")
...
:)
:)
:)
:)
>>> type(numbers)
<class 'int'>
>>> numbers
4
Upvotes: 0