Reputation: 13
I am attempting to scrape a website using beautifulsoup. I am largely successful but having two issues
After I get the data from website I am printing them to the screen as well as writing them into a CSV file. There is a price field in the website which has a rupee symbol in from of the actual amount (sample structure of the price field :₹ 10000). When I print the amount to console, it prints well and there are no issues. When I try to write it to the excel sheet, I get the error "Unicodeencoeerror" codec 'charmap' cannot encode character '\u20b9' in position 28. I am printing other fields to console and excel the issue shows up only with two fields one with the currency symbol and other with a * symbol
I have a loop running to get all pages from the webpage for a particular search. The search result is about 344 pages but the loop stops at about page 43 with only HTML error 500 as the error message
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
filename = "data.csv"
f = open(filename,"w")
headers = "phone_name, phone_price, phone_rating,number_of_ratings,
memory, display, camera, battery, processor, Warrenty, security, OS\n"
f.write(headers)
for i in range(2): # Number of pages minus one
my_url = 'https://www.flipkart.com/search?as=off&as-
show=on&otracker=start&page=
{}&q=cell+phones&viewType=list'.format(i+1)
print(my_url)
uClient=uReq(my_url)
page_html=uClient.read()
page_soup = Soup(page_html,"html.parser")
containers=page_soup.findAll("a", {"class":"_1UoZlX"})
for container in containers: phone_name =
container.find("div",{"class":"_3wU53n"}).text
try:
phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
except:
phone_price = 'No Data'
Thanks you very much for all you help!
Upvotes: 1
Views: 1346
Reputation: 177386
When writing .CSV files for Excel, the utf-8-sig
encoding should be used to support any Unicode character correctly. Excel will assume the localized ANSI encoding on Windows if just utf8
is used and display characters incorrectly.
#!python3
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
filename = "data.csv"
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
w = csv.writer(f)
headers = 'phone_name phone_price phone_rating number_of_ratings memory display camera battery processor Warrenty security OS'
w.writerow(headers.split())
for i in range(2): # Number of pages minus one
my_url = 'https://www.flipkart.com/search?as=off&as-show=on&otracker=start&page={}&q=cell+phones&viewType=list'.format(i+1)
print(my_url)
uClient=uReq(my_url)
page_html=uClient.read()
page_soup = Soup(page_html,"html.parser")
containers=page_soup.findAll("a", {"class":"_1UoZlX"})
for container in containers:
phone_name = container.find("div",{"class":"_3wU53n"}).text
try:
phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
except:
phone_price = 'No Data'
w.writerow([phone_name,phone_price])
Output:
phone_name,phone_price,phone_rating,number_of_ratings,memory,display,camera,battery,processor,Warrenty,security,OS
"Asus Zenfone 3 Laser (Gold, 32 GB)","₹9,999"
"Intex Aqua Style III (Champagne/Champ, 16 GB)","₹3,999"
"iVooMi i1s (Platinum Gold, 32 GB)","₹7,499"
"Xolo ERA 3X (Posh Black, 16 GB)","₹6,999"
"iVooMi Me1 (Sunshine Gold, 8 GB)","₹3,599"
"Panasonic Eluga A4 (Mocha Gold, 32 GB)","₹9,790"
Samsung Metro 313 Dual Sim,"₹2,025"
"Samsung Galaxy J3 Pro (Gold, 16 GB)","₹6,990"
Samsung Guru Music 2,"₹1,625"
"Panasonic Eluga A4 (Marine Blue, 32 GB)","₹9,640"
"Asus Zenfone 4 Selfie (Black, 32 GB)","₹9,999"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Asus Zenfone Max (Black, 16 GB)","₹7,486"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Swipe Elite Power (Space Grey, 16 GB)","₹5,499"
"Celkon Diamond Mega (Grey, 16 GB)","₹5,499"
"Asus Zenfone Max (Black, 32 GB)","₹7,999"
"Swipe Elite Power (Champagne Gold, 16 GB)","₹5,499"
"Asus Zenfone 4 Selfie (Gold, 32 GB)","₹9,999"
"Karbonn Aura (Champagne, 8 GB)","₹3,199"
"Infinix Note 4 (Ice Blue, 32 GB)","₹8,999"
"Infinix Note 4 (Milan Black, 32 GB)","₹8,999"
"Moto G5s Plus (Blush Gold, 64 GB)","₹15,990"
"Moto G5s Plus (Lunar Grey, 64 GB)","₹15,940"
Excel:
Upvotes: 5