Alanna Mueller
Alanna Mueller

Reputation: 29

Storing list into CSV file from webscraping via selenium?

Warning: new to python and programming

Objective: Scrape all job links from this page and place into a txt/csv/json/XML file: https://www.indeed.ca/jobs?q=title%3Aengineer&l=Vancouver%2C+BC

Code:

from selenium import webdriver
import csv
browser = webdriver.Firefox()
browser.get('https://www.indeed.ca/jobs?q=engineer&l=Vancouver%2C+BC&sort=date')
jobs = browser.find_elements_by_partial_link_text('Engineer')
for job in jobs:
    print(job.get_attribute("href"))
with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow(jobs)

It works great when it prints the results, but it doesn't store anything in the csv file. Also, I plan to make this scrape more than 1 page, so what would be the best way in modifying the csv file in a way that expands the links, not overwrites them?

Upvotes: 0

Views: 294

Answers (2)

ewwink
ewwink

Reputation: 19154

it is not writen to csv because the input jobs in wr.writerow(jobs) is not valid, you can do

with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow([j.get_attribute("href") for j in jobs])

Upvotes: 1

Red Cricket
Red Cricket

Reputation: 10470

This is strange looking for jobs in jobs:. Are you sure you didn't mean to write for job in jobs:? And that is probably your problem. You are stomping on your jobs iterable.

Take a look at this example:

>>> numbers = [1,2,3,4]
>>> numbers
[1, 2, 3, 4]
>>> type(numbers)
<type 'list'>
>>> for numbers in numbers:
...     print numbers
...
1
2
3
4
>>> numbers
4
>>> type(numbers)
<type 'int'>

It isn't the print numbers that is turning numbers into an int. Observe:

>>> numbers = [1,2,3,4]
>>> type(numbers)
<class 'list'>
>>> for numbers in numbers:
...    print(":)")
...    
:)
:)
:)
:)
>>> type(numbers)
<class 'int'>
>>> numbers
4

Upvotes: 0

Related Questions