S. Price
S. Price

Reputation: 81

Unable to find img element with xpath

Can anyone tell me why the code below won't return an emoji attribute...

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re    

# open webpage and allow time to load entirely
driver = Chrome()
driver.implicitly_wait(15)
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
tweet_ids = set()
tweet_ids.clear()
print(tweet_ids)


def main():

    # prevent computer from going to sleep
    pyautogui.press('shift')

    print("--checking for new alert...")
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0]\
                .replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg')"
                                               " or contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            

            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

       if tradeCriteria:
            tweet_id = ' '.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                 tweet_ids.add(tweet_id)
                 if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                      print(tradeCriteria)
                      print(emoji)

main()

But then the following code will return an emoji attribute...

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re


# open webpage and allow time to load entirely
driver = Chrome()
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
emojiSet = []
tweet_ids = set()
last_position = driver.execute_script("return window.pageYOffset;")
scrolling = True
tweet_ids.clear()
print(tweet_ids)
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

while scrolling:
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0].replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg') or"
                                               " contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            
            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

        if tradeCriteria:
            tweet_id = ''.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                tweet_ids.add(tweet_id)
                if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                    print(tradeCriteria)
                    print(emoji)

    scroll_attempt = 0
    while True:
        # check scroll position
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
        time.sleep(2)
        curr_position = driver.execute_script("return window.pageYOffset;")
        if last_position == curr_position:
            scroll_attempt += 1

            if scroll_attempt >= 3:
                scrolling = False
                break
            else:
                time.sleep(2)
        else:
            last_position = curr_position
            break

print(tweet_ids)

I know I've added the scrolling to the second code, so it's looking at the entire page and returning the elements I'm looking for. But other than that they're more or less the same. I could run the first code every few seconds and it will never find the emoji element. It will find the ticker and optCriteria no problem and print them together as the tradeCriteria, but it will never find the emoji attribute even if it's there.

I tried both implicitly wait and explicitly wait, but neither one worked. I also tried having the emoji xpath line in the if statement if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):, but that didn't work either.

Upvotes: 3

Views: 323

Answers (1)

Helpful Person
Helpful Person

Reputation: 96

After plugging your code into a comparison checker, it seems there is a space missing between line 38 and 43 respectively.

43: tweet_id = ' '.join(tradeCriteria)
38: tweet_id = ''.join(tradeCriteria)

This space is causing there to be a space between each element in the tradeCriteria list when joined.

43: a b c

38: abc

Seeing how the print(emoji) statement is after if tweet_id not in tweet_ids: in both files, I think this difference is what is causing the problem in the first file.

Alternatively, if you are scraping data from twitter, you can try using the official Twitter API with a python wrapper such as Tweepy as it is slightly easier. You can learn more about how to do that here.

Upvotes: 2

Related Questions