user10358702
user10358702

Reputation:

Cannot read a csv with urls to web scrape them in python

I am fresh new to python so I tried with visual studio and windows 7 the following

import csv
from bs4 import BeautifulSoup 
import requests 

contents = []
with open('websupplies.csv','r') as csvf: # Open file in read mode
   urls = csv.reader(csvf)

   for url in urls:
      contents.append(url) # Add each url to list contents


for url in contents:  # Parse through each url in the list.
   page = requests.get(url).content
   soup = BeautifulSoup(page, "html.parser")

   price = soup.find('span', attrs={'itemprop':'price'})
   availability = soup.find('div', attrs={'class':'product-availability'})

but I get - No connection adapters were found for .. '['a url']'

why?

The structure of csv is like the following

https://www.websupplies.gr/epeksergastis-intel-core-i5-8400-9mb-2-80ghz-bx80684i58400
https://www.websupplies.gr/epeksergastis-intel-celeron-g3930-2mb-2-90ghz-bx80677g3930
https://www.websupplies.gr/epeksergastis-amd-a6-9500-bristol-ridge-dual-core-3-5ghz-socket-am4-65w-ad9500agabbox

they dont have at the end a semicolumn

Upvotes: 2

Views: 100

Answers (2)

Daniel Roseman
Daniel Roseman

Reputation: 599480

Your file is a flat list of URLs. It's not really a CSV.

The CSV reader reads each row into its own list. So the structure of the loaded data would be:

[
  ["https://www.websupplies.gr/epeksergastis-intel-core-i5-8400-9mb-2-80ghz-bx80684i58400"],
  ["https://www.websupplies.gr/epeksergastis-intel-celeron-g3930-2mb-2-90ghz-bx80677g3930"],
  ["https://www.websupplies.gr/epeksergastis-amd-a6-9500-bristol-ridge-dual-core-3-5ghz-socket-am4-65w-ad9500agabbox"],
]

One way to fix this would be to use url[0] as the parameter to requests.get, but really the proper fix is not to use CSV at all. Since you have only one piece of data per line, you can just read the data directly and pass it to requests:

with open('websupplies.csv','r') as csvf: # Open file in read mode 
   for line in csvf:
      contents.append(line.strip('\n')) # Add each url to list contents

Upvotes: 1

Giovanni Cappelletti
Giovanni Cappelletti

Reputation: 71

In this question it says that requests needs the http scheme, maybe it's this the problem? You also have to remove the /n when you read the lines from the file

Upvotes: 1

Related Questions