venkat s
venkat s

Reputation: 13

Python scraping encoding issues

I am attempting to scrape a website using beautifulsoup. I am largely successful but having two issues

  1. After I get the data from website I am printing them to the screen as well as writing them into a CSV file. There is a price field in the website which has a rupee symbol in from of the actual amount (sample structure of the price field :₹ 10000). When I print the amount to console, it prints well and there are no issues. When I try to write it to the excel sheet, I get the error "Unicodeencoeerror" codec 'charmap' cannot encode character '\u20b9' in position 28. I am printing other fields to console and excel the issue shows up only with two fields one with the currency symbol and other with a * symbol

  2. I have a loop running to get all pages from the webpage for a particular search. The search result is about 344 pages but the loop stops at about page 43 with only HTML error 500 as the error message

    import bs4
    from urllib.request import urlopen as uReq
    
    from bs4 import BeautifulSoup as Soup
    filename = "data.csv"
    f = open(filename,"w")
    headers = "phone_name, phone_price, phone_rating,number_of_ratings, 
    memory, display, camera, battery, processor, Warrenty, security, OS\n"
    f.write(headers)
    
    
    for i in range(2):      # Number of pages minus one 
            my_url = 'https://www.flipkart.com/search?as=off&as-
            show=on&otracker=start&page=
            {}&q=cell+phones&viewType=list'.format(i+1)
            print(my_url)
    
            uClient=uReq(my_url)
    
            page_html=uClient.read()
    
            page_soup = Soup(page_html,"html.parser")
    
            containers=page_soup.findAll("a", {"class":"_1UoZlX"})
    
    
    
    
    for container in containers:          phone_name        =  
    container.find("div",{"class":"_3wU53n"}).text
    
       try:
       phone_price =  container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
    
       except:
       phone_price           =  'No Data'
    

Thanks you very much for all you help!

Upvotes: 1

Views: 1346

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177386

When writing .CSV files for Excel, the utf-8-sig encoding should be used to support any Unicode character correctly. Excel will assume the localized ANSI encoding on Windows if just utf8 is used and display characters incorrectly.

#!python3
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup

filename = "data.csv"
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
    w = csv.writer(f)
    headers = 'phone_name phone_price phone_rating number_of_ratings memory display camera battery processor Warrenty security OS'
    w.writerow(headers.split())

    for i in range(2):      # Number of pages minus one 
            my_url = 'https://www.flipkart.com/search?as=off&as-show=on&otracker=start&page={}&q=cell+phones&viewType=list'.format(i+1)
            print(my_url)
            uClient=uReq(my_url)
            page_html=uClient.read()
            page_soup = Soup(page_html,"html.parser")
            containers=page_soup.findAll("a", {"class":"_1UoZlX"})

    for container in containers:
        phone_name = container.find("div",{"class":"_3wU53n"}).text

        try:
            phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
        except:
            phone_price = 'No Data'

        w.writerow([phone_name,phone_price])

Output:

phone_name,phone_price,phone_rating,number_of_ratings,memory,display,camera,battery,processor,Warrenty,security,OS
"Asus Zenfone 3 Laser (Gold, 32 GB)","₹9,999"
"Intex Aqua Style III (Champagne/Champ, 16 GB)","₹3,999"
"iVooMi i1s (Platinum Gold, 32 GB)","₹7,499"
"Xolo ERA 3X (Posh Black, 16 GB)","₹6,999"
"iVooMi Me1 (Sunshine Gold, 8 GB)","₹3,599"
"Panasonic Eluga A4 (Mocha Gold, 32 GB)","₹9,790"
Samsung Metro 313 Dual Sim,"₹2,025"
"Samsung Galaxy J3 Pro (Gold, 16 GB)","₹6,990"
Samsung Guru Music 2,"₹1,625"
"Panasonic Eluga A4 (Marine Blue, 32 GB)","₹9,640"
"Asus Zenfone 4 Selfie (Black, 32 GB)","₹9,999"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Asus Zenfone Max (Black, 16 GB)","₹7,486"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Swipe Elite Power (Space Grey, 16 GB)","₹5,499"
"Celkon Diamond Mega (Grey, 16 GB)","₹5,499"
"Asus Zenfone Max (Black, 32 GB)","₹7,999"
"Swipe Elite Power (Champagne Gold, 16 GB)","₹5,499"
"Asus Zenfone 4 Selfie (Gold, 32 GB)","₹9,999"
"Karbonn Aura (Champagne, 8 GB)","₹3,199"
"Infinix Note 4 (Ice Blue, 32 GB)","₹8,999"
"Infinix Note 4 (Milan Black, 32 GB)","₹8,999"
"Moto G5s Plus (Blush Gold, 64 GB)","₹15,990"
"Moto G5s Plus (Lunar Grey, 64 GB)","₹15,940"

Excel:

enter image description here

Upvotes: 5

Related Questions