Reputation: 87
This code seems to work for almost all webpages i want to scrape, but for this webpage :- https://www.usana.com/ux/dotcom/#!/enu-US/contact , it is giving just one line of text, whereas , on the webpage, i can see many addresses are given:-
options = webdriver.ChromeOptions() # for cookies
options.add_argument(r"C:\Users\XXXXX\Selenium") # this is the directory for the cookies
driver = webdriver.Chrome(r'C:\Users\XXXXX\XXXXXX\Documents\chromedriver.exe', options=options)
driver.set_page_load_timeout(100)
driver.get("https://www.usana.com/ux/dotcom/#!/enu-US/contact")
time.sleep(30)
try:
click_alert=driver.switch_to.alert() # to click on the pop up window
click_alert.accept()
except:
pass
res = requests.get(driver.current_url,headers = headers)
soup = BeautifulSoup(res.content, 'lxml')
txt = soup.text
print(txt)
I have tried to handle the cookies agreement message and the pop up window that appears on the page, but it still produces just the one line of output as below :-
USANA Health Sciences
I am seeking to have all the addresses on this page as text.
What need to be added or edited in the above code ? Any help is highly appreciated
Upvotes: 1
Views: 151
Reputation: 87
I have also found another way to solve this :-
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome(r'C:\Users\XXXXXX\XXXX\Documents\chromedriver.exe')
browser.get(url)
browser.set_page_load_timeout(100)
time.sleep(3)
try:
click_alert=browser.switch_to.alert()
click_alert.accept()
wait(browser,10).until(EC.element_to_be_clickable((By.XPATH,"//*[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') , 'agree')]"))).click()
except:
pass
Upvotes: 0
Reputation: 9430
You need to use driver.page_source for BeautifulSoup. The reason being the URI fragment (everything after the # in the URL) is not sent to the server so you need a browser to render the page presumably using JavaScript (requests doesn't send it and doesn't execute JavaScript so the page doesn't render as expected).
Clients are not supposed to send URI fragments to servers https://en.wikipedia.org/wiki/URI_fragment
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException, ElementNotInteractableException
options = webdriver.ChromeOptions()
options.add_argument(r"C:\Users\XXXXX\Selenium")
driver = webdriver.Chrome(r'C:\Users\XXXXX\XXXXXX\Documents\chromedriver.exe', options=options)
driver.set_page_load_timeout(100)
driver.get("https://www.usana.com/ux/dotcom/#!/enu-US/contact")
time.sleep(3)
try:
driver.find_element_by_xpath("//*[contains(text(), 'OK')]").click()
except NoSuchElementException:
print("Alert already accepted")
try:
driver.find_element_by_class_name("optanon-allow-all").click()
except ElementNotInteractableException:
print("Cookies already accepted")
soup = BeautifulSoup(driver.page_source, 'lxml')
headers = [x.get_text(separator=" ").strip() for x in soup.find_all('div', {'class': 'card-header'})]
bodies = [x.get_text(separator=" ").strip() for x in soup.find_all('div', {'class': 'card-body'})]
print(list(zip(headers, bodies)))
driver.quit()
Outputs:
[('USANA Hong Kong', '5/F, Sino Plaza 255-257 Gloucester Road Causeway Bay, Hong Kong Customer Services Hotline: (852) 2162 1888 Order Express: (852) 2162 1800 Office Hours Customer service Monday–Friday 12:00 p.m.–9:00 p.m. (HKT) Saturday 11:00 a.m.–4:00 p.m. The office is closed on Sundays and public holidays. [email protected]'), ('USANA Japan', 'USANA Health Sciences Japan LLC. Ichigaya MS Bldg 2F 4-1-9 Kudankita, Chiyoda-ku Tokyo, Japan 102-0073 Contact Information Phone: 03-5215-3050 (Out of Japan: +81-3-5215-3050) Fax: 03-5215-3052 (Out of Japan: +81-3-5215-3052) Office Hours Customer Service: Monday–Friday 10:00 a.m. to 1:00 p.m., 2:00 p.m.to 6:00 p.m. Office Counter: Monday–Friday 13:00 - 19:00 [email protected]'), ('USANA Health Sciences Korea, Ltd.', '5F SI Tower 203, Teheran-ro, Gangnam-gu Seoul, Korea 06141 Tel: +82-2-2192-7300 Fax:+82-2-2192-7399 Customer Phone Service Line Business Hour Mon - Fri / 9:00 AM - 6:00 PM Sat / 9:00 AM - 1:00 PM (Closed Sun & Holiday) Customer Will-Call Service Line Business Hour Mon - Fri 9:00AM - 6:00PM (Closed Sat-Sun & Holiday) [email protected]'), ('USANA Taiwan', '7F, No. 99, Fu-Hsin N. Road, Taipei 105, Taiwan, Republic of China Tel:+886-2-7724-8000 Fax: +886-2-7724-1000 Customer Service line: 0809-085-588 Customer Service Fax: 0809-085-500 Office Hours Monday–Friday 9:00 a.m. to 6:00 p.m. (CST) Closed weekends and national holidays. Customer Service and Will Call Monday–Friday 12:00 noon to 9:00 p.m. Closed weekends and national holidays. [email protected]'), ('USANA Australia', '3 Hudson Avenue Castle Hill NSW 2154, Australia Customer Service Phone: +612 9842 4600 Toll-free: 1800 687 872 Sydney Business Center Opening Hours Mon 8:30 a.m. to 5:00 p.m. Tue 8:30 a.m. to 5:00 p.m. Wed 8:30 a.m. to 5:00 p.m. Thu 9:00 a.m. to 5:00 p.m. Fri 8:30 a.m. to 5:00 p.m. Sat 9:30 a.m. to 3:00 p.m. Sundays and public holidays:Closed [email protected]'), ('USANA Malaysia', 'USANA Malaysia UHS Essential Health (Malaysia) Sdn. Bhd. Unit M2-2 & M2-5, Level M2, The Vertical Podium Avenue 3, Bangsar South No. 8 Jalan Kerinchi 59200 Kuala Lumpur Telephone: 603-2246 0800 Facsimile: 603-2246 0901 Office Hours Monday–Friday 11:30 a.m. to 7:30 p.m. (MYT) Saturday 10:30 a.m. to 1:30 p.m. Sundays and public holidays: Closed [email protected]'), ('USANA Philippines', 'UHS Essential Health Philippines, Inc. 24th Floor, Tower 1, The Enterprise Center, 6766 Ayala Avenue corner Paseo de Roxas, Makati City, Philippines 1200 Customer Service: (632) 858-4500 Phone Order Line (632) 858-4599 Fax Order Line Office Hours Business Center Monday-Friday 11:00am to 8:00pm Saturday 9:00 am to 1:00pm Customer Service Monday-Friday 11:00am to 8:00pm Saturday 9:00 am to 1:00pm [email protected]'), ('USANA New Zealand', 'P.O. Box 17409, Greenlane 1546, AUCKLAND Level 1, 93 Ascot Avenue, Greenlane, Auckland 1051 Customer Service Phone: +64 9 415 2750 Toll-free: 0800 872 626 Auckland Business Center Opening Hours Mon 9:00 a.m. to 7:00 p.m. Tue 9:00 a.m. to 5:00 p.m. Wed 9:00 a.m. to 5:00 p.m. Thu 9:00 a.m. to 5:00 p.m. Fri 9:00 a.m. to 5:00 p.m. Sat 10:00 a.m. to 3:00 p.m. Sundays and public holidays: Closed [email protected]'), ('USANA Health Sciences Singapore Pte Ltd', '391B Orchard Road, Ngee Ann City Tower B, #19-01/02 Singapore 238874 Customer Service: (65) 6820-8828 Fax: (65) 6820-7007 Business Hours: Mon to Fri - 1230hr to 2030hr Saturday - 1030hr to 1400hr Sunday / Public Holiday - Closed [email protected]'), ('USANA Health Sciences (Thailand) Ltd.', 'Unit 01-04 Chamchuri Square Building 319 Phyathai Road Pathumwan, Bangkok 10330 Distributor Services: 02-761-4300 Customer Service and Will Call Monday–Friday: 11.00 a.m.-8.00 p.m. Saturday: 1.00 p.m.-5.00 p.m. Closed Sunday and national holidays [email protected]'), ('USANA Health Sciences Indonesia', 'Menara Jamsostek South Tower 14th Floor Jalan Gatot Subroto Kav 38 Jakarta 12710 Indonesia Contact Information Reception Phone: +62 21 278 38 600 Customer Service Call Center: 1500847 Customer Service Fax: +62 21 278 38 688 Office Hours Business Centre Monday to Friday: 11:00 a.m.–11:00 p.m. Saturday: 9:00 a.m.–6:00 p.m. Customer Service Monday to Friday: 11:00 a.m.–8:00 p.m. Saturday: 9:00 a.m.–1:00p.m Facebook https://www.facebook.com/officialusanaindonesia [email protected]'), ('USANA Netherlands', 'USANA Health Sciences 92, avenue des Ternes Paris, France 75017 Distributor Services: 0800-022-7288 Fax: 001-801-954-7240 Customer Service Hours Monday through Friday Reception, meeting rooms and will call: Tuesday, Thursday and Friday 12:30 p.m.–8:00 p.m. Wednesday 12:30 pm – 4:00 pm. Saturday 10:00 a.m.–5:00 p.m. Call center: 9:00 a.m.– midnight (GMT+1). [email protected]'), ('USANA United Kingdom', 'USANA United Kingdom Customer service representatives located in Salt Lake City, Utah, support the Associates and Preferred Customers in the United Kingdom. Customer Service: 08 08 234 4478 Fax: 08 08 234 2472 Opening Hours: Monday – Friday (excluding some holidays) 1:30 p.m. to 2:00 a.m. (London time). Customer Service Hours Monday through Friday 6:30 AM to 9:00 PM MST. [email protected]'), ('USANA France/Belgium, Paris Office', 'USANA Europe (Paris Office) \n121 Av. Des Champs Élysées \n75008 Paris, France \nDoor Code: B152 \nOrder pick-up: \nThursday 12:30 - 10:00pm \nFriday 12:30 - 8:00pm \nSaturday 12:30 - 7:00pm \nMeeting rooms upon reservation: \[email protected] \n \nCustomer Service: \nFrance: +33 1 42 99 76 50 \nRomania: +40 312 295 242 \nGermany: 0800 1825899 \nBelgium: 0 800 14 432 \nSpain: 900 941 696 \nItaly: 800 790 241 \nUK: 08 08 234 4478 (calls to SLC office) [email protected]'), ('USANA United States', '3838 West Parkway Boulevard Salt Lake City, UT 84120 Receptionist and Investor Relations Phone: 801-954-7100 Fax: 801-954-7300 [email protected] Hours: 9:00 am – 5:00 pm MST Customer Service Phone: 1-888-950-9595 Fax: 1-800-289-8081 Languages available: English, Spanish, French, Mandarin, Cantonese, & Korean Hours: 6:30am – 9:00pm MST [email protected]'), ('USANA Puerto Rico', '3838 West Parkway Boulevard Salt Lake City, UT 84120 Receptionist and Investor Relations Phone: 801-954-7100 Fax: 801-954-7300 [email protected] Hours: 9:00 am – 5:00 pm MST Customer Service Phone: 1-888-950-9595 Fax: 1-800-289-8081 Languages available: English, Spanish, French, Mandarin, Cantonese, & Korean Hours: 6:30am – 9:00pm MST [email protected]'), ('Caribbean', 'Customer Service (toll-free): 1-888-950-9595 Fax: 1-801-954-7300 Trinidad and Tobago Customer Service (toll-free): 1-888-667-3574 Fax: 1-801-954-7300 The Dominican Republic Customer Service (toll-free): 1-888-751-2425 Fax: 1-801-954-7300 Office Hours Monday–Friday (excluding some holidays) 6:30 a.m.–9:00 p.m. (MST/MDT). The Caribbean is two hours ahead of Mountain Time. [email protected]'), ('USANA Canada, Ontario Office', '80 Innovation Dr. Woodbridge, ON L4H0T2 CANADA Customer Service Phone: 1-888-950-9595 Fax: 1-800-289-8081 Languages available: English, Spanish, French, Mandarin, Cantonese, & Korean Hours: 6:30am – 9:00pm MST Investor Relations Phone: 801-954-7100 Fax: 801-954-7300 [email protected] Hours: 8:00 am – 6:00 pm MST Office Hours Monday: 9:00 a.m. to 5:00 p.m.* Tuesday: 9:00 a.m. to 5:00 p.m.* Wednesday: 9:00 a.m. to 5:00 p.m. Thursday: 9:00 a.m. to 5:00 p.m. Friday: 9:00 a.m. to 5:00 p.m.* Office hours are in EDT/EST *Associates are asked to contact the office if they require assistance prior to 9 a.m. or after 5 p.m., as we will be pleased to make arrangements to accommodate them. [email protected]'), ('USANA Canada, Vancouver Office', 'Suite 2118, 13353 Commerce Parkway Richmond, British Columbia CANADA V6V 3A1 Customer Service Phone: 1-888-950-9595 Fax: 1-800-289-8081 Languages available: English, Spanish, French, Mandarin, Cantonese, & Korean Hours: 6:30am – 9:00pm MST Investor Relations Phone: 801-954-7100 Fax: 801-954-7300 [email protected] Hours: 8:00 am – 6:00 pm MST Office Hours \nMonday to Friday, 11:00 a.m. to 7:00 p.m. (PDT/PST) \n [email protected]'), ('USANA Mexico S.A. de C.V.', 'Av. paseo de las Palmas 525, piso no. 8 Col. Lomas de Chapultepec, Del. Miguel Hidalgo México D.F. C.P. 11000 Reception: (55) 5093-9650 Distributor Services / Order Express: 01 800 08 USANA (87262) Fax: 01 800 08 USANA (87262) Office Hours Monday–Friday 9:00 a.m. to 6:00 p.m. (CST/CDT) Saturday 9:00 a.m. to 2:00 p.m. (CST/CDT)—Will Call only [email protected]'), ('USANA Health Sciences Colombia, S.A.S.', 'Calle 100 No. 13 - 76 Piso 4to Torre Mansarovar Bogotá D.C. Colombia Main Office Phone: (57) 1-546-3939 Fax: (57) 1-546-3951 Distributor Services Phone: (57) 1-546-3939 Toll Free: 01 8000 963750 Fax: (57) 1-546-3950 Office Hours Monday – Friday (Not holidays) / 9:00 a.m. – 6:00 p.m. COT Saturday / 9:00 a.m. – 12:30 p.m. COT Customer Service Hours Monday - Friday 9:00 a.m. – 6:00 p.m. COT Closed Sundays and Holidays [email protected]'), ('', ''), ('', ''), ('', ''), ('', '')]
Updated in response to comment
If you want the complete text of the page replace the last four lines above with:
print(soup.get_text(separator=" ").strip())
driver.quit()
Upvotes: 1