SIM
SIM

Reputation: 22440

Can't parse certain fields from a webpage

I've written a script in python with selenium to grab the Director name and Phone number from a webpage. When I execute my script I get the results like below which are in a single list:

['Director: Cheryl Hughley\nPhone: 661-421-5861\nEmail: [email protected]']

How can I parse only the name and the phone number on the fly from that site in separate fields like:

name: Cheryl Hughley
phone : 661-421-5861

This is what I tried that produces the result within a list (first example) above:

from selenium import webdriver

link ="https://www.nafe.com/bakersfield-nafe-network"

def search_info(driver,url):
    driver.get(url)
    info = [item.text.strip() for item in driver.find_elements_by_css_selector(".markdown p") if "Phone" in item.text]

    print(f'{info}')

if __name__ == '__main__':
    driver = webdriver.Chrome()
    try:
        search_info(driver,link)
    finally:
        driver.quit()

I do not wish to process the result after they are parsed; rather, I wish to get them on the fly. Will regex be a good option here? Thanks.

Upvotes: 1

Views: 42

Answers (1)

Andersson
Andersson

Reputation: 52665

You can try below solution:

info = [driver.execute_script("return arguments[0].childNodes[arguments[1]].textContent;", item, index).strip() for index in [0, 2] for item in driver.find_elements_by_css_selector(".markdown p") if "Phone" in item.text]

to get output

['Director: Cheryl Hughley', 'Phone:  661-421-5861']

or

info = [driver.execute_script("return arguments[0].childNodes[arguments[1]].textContent;", item, index).split(": ")[-1].strip() for index in [0, 2] for item in driver.find_elements_by_css_selector(".markdown p") if "Phone" in item.text]

to get

['Cheryl Hughley', '661-421-5861']

Upvotes: 1

Related Questions