Reputation: 133
I don't get what's the problem here. I want to build a web scraper that scrapes amazon and takes the price and the name into a database. But for some reason, it tells me that the columns and values are not matching. I do have one additional column in my database called "timestamp" where I automatically put in the time, but that is handled by the database. I am using MariaDB. A friend said I can use the MySQL API for MariaDB as well.
P.S. preis = price, coming from Germany, switching between English and German sometimes, just in case anyone is wondering.
import requests, time, csv, pymysql
from bs4 import BeautifulSoup as bs
#URL = input("URL")
URL = "https://www.amazon.de/gp/product/B075FTXF15/ref=crt_ewc_img_bw_3?ie=UTF8&psc=1&smid=A24FLB4J0NZBNT"
def SOUPIT (tempURL):
URL = tempURL
page = requests.get(URL,headers={"User-Agent":"Defined"})
soup = bs(page.content, "html.parser")
raw_price = soup.find(id="priceblock_ourprice").get_text()
price = raw_price[:-2]
raw_name = soup.find(id="productTitle").get_text()
name = raw_name.strip()
for i in range(0,len(name)-1):
if name[i] == "(":
name = name[:i]
break
data = [name, price, time.strftime("%H:%M:%S"), time.strftime("%d.%m.%Y")]
return(data)
data = SOUPIT(URL)
while True:
data = SOUPIT(URL)
db = pymysql.connect("localhost", "root", "root", "test")
cursor = db.cursor()
if (data == None):
break
print("break")
else:
name = data[0]
preis = data[1]
sql = """INSERT INTO amazon_preise (Name, Preis) VALUES ('{}',{})""".format(name,preis)
cursor.execute(sql)
db.commit()
print("success")
print(data)
time.sleep(60)
error message:
Traceback (most recent call last):
File "amazonscraper_advanced.py", line 43, in <module>
cursor.execute(sql)
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\cursors.py", line 170, in execute
result = self._query(query)
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\cursors.py", line 328, in _query
conn.query(q)
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\connections.py", line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\connections.py", line 732, in _read_query_result
result.read()
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\connections.py", line 1075, in read
first_packet = self.connection._read_packet()
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\connections.py", line 684, in _read_packet
packet.check_error()
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "C:\Users\...\AppData\Local\Programs\Python\Python36\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.InternalError: (1136, "Column count doesn't match value count at row 1")
Upvotes: 0
Views: 1507
Reputation: 55599
The problem is caused, at least partially, by a using string formatting to insert values into an SQL statement.
Here is the scraped data:
>>> data = ['Sweatshirt Alien VS. Predator Z100088', '32,99', '14:08:43', '08.09.2019']
>>> name, preis, *_ = data
Let's create the SQL statement
>>> sql = """INSERT INTO amazon_preise (Name, Preis) VALUES ('{}',{})""".format(name,preis)
And display it:
>>> sql
"INSERT INTO amazon_preise (Name, Preis) VALUES ('Sweatshirt Alien VS. Predator Z100088',32,99)"
Observe that the VALUES
clause contains three comma-separated values; this is because the web page displays currency in the German style, that is with commas separating the cents from the euros. When interpolated into the SQL statement
preis
becomes two values instead of one.
The right way to fix this would be to convert preis
from a string to a float or decimal, and use parameter substitution instead of string formatting to interpolate the values..
>>> fpreis = float(preis.replace(',', '.'))
>>> sql = """INSERT INTO amazon_preise (Name, Preis) VALUES (%s, %s)"""
>>> cursor.execute(sql, (name, fpreis))
Upvotes: 1