SeánMcK
SeánMcK

Reputation: 422

Populating a MySQL table with scraped data

I'm using Python 3, MySQL, Sequel Pro and BeautifulSoup.

Put simply, I want to create a SQL table and then insert my downloaded data into that data.

I've used this answer as a template to build the SQL part Beautiful soup webscrape into mysql, but it won't work.

Errors thrown:

line 86 finally:SyntaxError: invalid syntax

When I comment out this last finally: (just see if the rest of the code worked) I get:

InternalError: (1054, "Unknown column 'address' in 'field list'") 

Another common error I got was:

ProgrammingError: (1146, "Table 'simple_scrape.simple3' doesn't exist", though I can't remember the exact changes I made to end up with this error.

Finally- I started to learn programming (not just Python, but 'programming') less than four weeks ago- if you're wondering why I've done something stupid or inefficient it's almost certainly because that was the first way I got it to work! Please help!

Code:

from selenium import webdriver

#Guess BER Number
for i in range(108053983,108053985):
    try:    
#        ber_try = 100000000 
        ber_try =+i
#Open page & insert BER Number
        browser = webdriver.Firefox()
        type(browser)
        browser.get('https://ndber.seai.ie/pass/ber/search.aspx')
        ber_send = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_txtBERNumber')
        ber_send.send_keys(ber_try)
        
 #click search
        form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_Bottomsearch')
        form.click()
        

#click intermediate page
        form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_gridRatings_gridview_ctl02_ViewDetails')
        form.click()
               
#scrape the page
        import bs4
        
    
        
      
        soup = bs4.BeautifulSoup(browser.page_source)
        
        
        # First Section
        ber_dec = soup.find('fieldset', {'id':'ctl00_DefaultContent_BERSearch_fsBER'})
        
        
        address = ber_dec.find('div', {'id':'ctl00_DefaultContent_BERSearch_dfBER_div_PublishingAddress'})
        address = (address.get_text(', ').strip())
        print(address)
        
        
        date_issue = ber_dec.find('span', {'id':'ctl00_DefaultContent_BERSearch_dfBER_container_DateOfIssue'}) 
        date_issue = date_issue.get_text().strip()
        print(date_issue)
        
    except:  
        print('Invalid BER Number:', ber_try)
        browser.quit()
   
       
     #connecting to mysql       

  
    finally:
            import pymysql.cursors
            from pymysql import connect, err, sys, cursors
     
    #Making the connection
            connection = pymysql.connect(host = '127.0.0.1',
                                        port = 3306,
                                        user = 'root',
                                        passwd = 'root11',
                                        db = 'simple_scrape',
                                        cursorclass=pymysql.cursors.DictCursor);

            with connection.cursor() as cursor:
                sql= """CREATE TABLE `simple3`(
                (
                `ID` INT AUTO_INCREMENT NOT NULL,
                `address` VARCHAR( 200 ) NOT NULL,
                `date_issue` VARCHAR( 200 ) NOT NULL,
                
                PRIMARY KEY ( `ID` )
            )Engine = MyISAM)"""
        
                sql = "INSERT INTO `simple3` (`address`, `date_issue`) VALUES (%s, %s)"
                cursor.execute(sql, (address, date_issue))
            connection.commit()
    finally:
            connection.close()
    
    browser.quit()
        

Upvotes: 1

Views: 1296

Answers (1)

Norbert
Norbert

Reputation: 6084

Issues: And actually create the table

            sql= """CREATE TABLE simple3(
            (
            ID INT AUTO_INCREMENT NOT NULL,
            address VARCHAR( 200 ) NOT NULL,
            date_issue VARCHAR( 200 ) NOT NULL,

            PRIMARY KEY ( ID )
        )Engine = MyISAM)"""
// Added this line since your table was not being created.
            cursor.execute(sql)

            sql = "INSERT INTO simple3 (address, date_issue) VALUES (%s, %s)"
            cursor.execute(sql, (address, date_issue))

Upvotes: 1

Related Questions