nishida
nishida

Reputation: 23

Prevent duplicate rows in mysql database

I wrote code to scrape car info (title, make, model, transmission, year, price) data from ebay.com and save in Mysql.

I want if all a row's (title, make, model, ...) items are similar to another row then avoid to insert this data to Mysql, only when all of a row's items are similar (don't suppress insertion if only the title, or the model etc is identical).

I created the database and table very simple (i'm beginner), if need i can delete database and create a new

Code:

import requests
from bs4 import BeautifulSoup
import re
import mysql.connector

conn = mysql.connector.connect(user='root', password='1234', 
host='127.0.0.1', database='web_scraping')
cursor = conn.cursor()
url = 'https://www.ebay.com/b/Cars-Trucks/6001?_fsrp=0&_sacat=6001&LH_BIN=1&LH_ItemCondition=3000%7C1000%7C2500&rt=nc&_stpos=951 25&Model%2520Year=2020%7C2019%7C2018%7C2017%7C2016%7C2015'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
ebay_cars = soup.find_all('li', class_='s-item')
for car_info in ebay_cars:
    title_div = car_info.find('div', class_='s-item__wrapper clearfix')
    title_sub_div = title_div.find('div', class_='s-item__info clearfix')
    title_p = title_sub_div.find('span', class_='s-item__price')
    title_tag = title_sub_div.find('a', class_='s-item__link')
    title_maker = title_sub_div.find('span', class_='s-item__dynamic s- 
    item__dynamicAttributes1')
    title_model = title_sub_div.find('span', class_='s-item__dynamic s- 
    item__dynamicAttributes2')
    title_trans = title_sub_div.find('span', class_='s-item__dynamic s- 
    item__dynamicAttributes3')



    name_of_car = re.sub(r'\d{4}', '', title_tag.text)
    maker_of_car = re.sub(r'Make: ','', title_maker.text)
    model_of_car = re.sub(r'Model: ', '', title_model.text)
    try:
        if title_trans.text.startswith(r'Transmission: '):
            trans_of_car = re.sub(r'Transmission: ', '', title_trans.text)
        else:
            trans_of_car = ''
    except AttributeError:
        trans_of_car = ''
    year_of_car = re.findall(r'\d{4}', title_tag.text)
    year_of_car = ''.join(str(x) for x in year_of_car)

    price_of_car = title_p.text
    print(name_of_car ,trans_of_car )
    sql = 'INSERT INTO car_info(Title, Maker, Model, Transmission, Year, 
    Price) VALUES (%s, %s, %s, %s, %s, %s)'
    cursor.execute(sql , (name_of_car, maker_of_car, model_of_car, 
    trans_of_car, year_of_car, price_of_car))



conn.commit()
conn.close()

Upvotes: 0

Views: 629

Answers (3)

Chris Rahmé
Chris Rahmé

Reputation: 368

Try this in MySQL:

ALTER TABLE car_info
ADD CONSTRAINT unique_car
UNIQUE (title, maker, model, transmission, year, price)

You only have to do it once, and no need to recreate your table. Just copy and paste this into the console.

Don't hesitate to ask for more help if it's not clear!

Upvotes: 1

Moritz Pfennig
Moritz Pfennig

Reputation: 172

This is not directly answering your question, but I am not yet able to comment... I would suggest you to look at SQLAlchemy, a libary for python to make working with sql dtabases easier. There you can for example give an argument unique=True in the model generator. See herer for SQLAlchemy: https://www.sqlalchemy.org/

Upvotes: 0

Adarsh
Adarsh

Reputation: 116

You can try to make the columns of the mysql table as unique so that it won't allow redundant data to insert

CREATE TABLE TABLE1(
  col1 varchar(20),
  col2 varchar(20),
  col3 varchar(20),
  col4 varchar(20),
  CONSTRAINT uc UNIQUE (col1,col2,col3,col4)
 );

Hope this helps :-)

Upvotes: 1

Related Questions