Junhyeok Park
Junhyeok Park

Reputation: 41

Web Scraping: how to extract this kind of div tag?

I am looking at a tag :

enter image description here.

When I write a code,

message = soup.find("div", {"class": "text-msg-container"})

it gave me none. What are _ngcontent-vex-c62 and data-e2e-text-message-content tags? Do I need to include them too? How should I write them to get the div tag?

Upvotes: 4

Views: 735

Answers (4)

tallcoder
tallcoder

Reputation: 317

i hope that works

from selenium import webdriver

path = "C:/chromedriver.exe"    ### path to downloaded chromedriver on your 
                                #pc change this directory or put the same location C:

driver = webdriver.Chrome(path) ## your browser change it if you are not using chrome 
driver.get("website link")

out = driver.find_element_by_class_name("text-msg-container")
print(out.text)

Upvotes: 1

Mohamed Abdallah
Mohamed Abdallah

Reputation: 996

You can't because the div isn't there when you send a GET request to get the page code.

That page is built using Angular framework which produce SPA(Single Page Application) which means you can't scrape data from it when you send a GET request because the data isn't there. The data is being generated by Javascript code which needs to run first to add the required data to the webpage.

You need to use another way that allows Javascript code to run first then you try to get the data you want.

Upvotes: 2

tallcoder
tallcoder

Reputation: 317

try this please

message = soup.find("div", _class="text-msg-container")

Upvotes: 1

vitaliis
vitaliis

Reputation: 4212

If you want to find class text-msg-container, try Selenium. It will find any locator easily.

import unittest    
from selenium import webdriver
    
    class PythonSearch(unittest.TestCase):
    
        def setUp(self):
            self.driver = webdriver.Firefox()
    
        def test_search(self):
            driver = self.driver
            driver.get("http://www.yoursite.com")
            elem = driver.find_element_by_css_selector(".text-msg-container")
    
        def tearDown(self):
            self.driver.close()
    
    if __name__ == "__main__":
        unittest.main()

Use driver = webdriver.Chrome('/path/to/chromedriver') if you are testing Chrome. Look here for more info https://chromedriver.chromium.org/getting-started . Getting started for Selenium https://selenium-python.readthedocs.io/getting-started.html#simple-usage

Upvotes: 1

Related Questions