Reputation: 81
Can anyone tell me why the code below won't return an emoji attribute...
from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re
# open webpage and allow time to load entirely
driver = Chrome()
driver.implicitly_wait(15)
driver.get("https://twitter.com")
time.sleep(2)
# start scraping tweets
tickerOptDetails = []
tweet_ids = set()
tweet_ids.clear()
print(tweet_ids)
def main():
# prevent computer from going to sleep
pyautogui.press('shift')
print("--checking for new alert...")
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
for card in page_cards:
try:
ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
'/../following-sibling::span').text.split('\n')[0]\
.replace('-', '').replace('$', '')
emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg')"
" or contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
.get_attribute("title")
tradeCriteria = str(ticker+optCriteria)
except NoSuchElementException:
continue
if tradeCriteria:
tweet_id = ' '.join(tradeCriteria)
if tweet_id not in tweet_ids:
tweet_ids.add(tweet_id)
if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):
print(tradeCriteria)
print(emoji)
main()
But then the following code will return an emoji attribute...
from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re
# open webpage and allow time to load entirely
driver = Chrome()
driver.get("https://twitter.com")
time.sleep(2)
# start scraping tweets
tickerOptDetails = []
emojiSet = []
tweet_ids = set()
last_position = driver.execute_script("return window.pageYOffset;")
scrolling = True
tweet_ids.clear()
print(tweet_ids)
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
while scrolling:
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
for card in page_cards:
try:
ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
'/../following-sibling::span').text.split('\n')[0].replace('-', '').replace('$', '')
emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg') or"
" contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
.get_attribute("title")
tradeCriteria = str(ticker+optCriteria)
except NoSuchElementException:
continue
if tradeCriteria:
tweet_id = ''.join(tradeCriteria)
if tweet_id not in tweet_ids:
tweet_ids.add(tweet_id)
if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):
print(tradeCriteria)
print(emoji)
scroll_attempt = 0
while True:
# check scroll position
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
time.sleep(2)
curr_position = driver.execute_script("return window.pageYOffset;")
if last_position == curr_position:
scroll_attempt += 1
if scroll_attempt >= 3:
scrolling = False
break
else:
time.sleep(2)
else:
last_position = curr_position
break
print(tweet_ids)
I know I've added the scrolling to the second code, so it's looking at the entire page and returning the elements I'm looking for. But other than that they're more or less the same. I could run the first code every few seconds and it will never find the emoji element. It will find the ticker and optCriteria no problem and print them together as the tradeCriteria, but it will never find the emoji attribute even if it's there.
I tried both implicitly wait and explicitly wait, but neither one worked. I also tried having the emoji xpath line in the if statement if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):
, but that didn't work either.
Upvotes: 3
Views: 323
Reputation: 96
After plugging your code into a comparison checker, it seems there is a space missing between line 38 and 43 respectively.
43: tweet_id = ' '.join(tradeCriteria)
38: tweet_id = ''.join(tradeCriteria)
This space is causing there to be a space between each element in the tradeCriteria
list when joined.
43: a b c
38: abc
Seeing how the print(emoji)
statement is after if tweet_id not in tweet_ids:
in both files, I think this difference is what is causing the problem in the first file.
Alternatively, if you are scraping data from twitter, you can try using the official Twitter API with a python wrapper such as Tweepy as it is slightly easier. You can learn more about how to do that here.
Upvotes: 2