Reputation: 3
I am trying to do a sentimental analysis on Twitter using Python. Here is my code
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
base_url = u'htttps://twitter.com/search?q='
query = u'seattlepacificuniversity'
url = base_url + query
browser = webdriver.Chrome()
browser.get(url)
time.sleep(1)
body = browser.find_elements_by_tag_name('body')
for _ in range(100):
body.send.keys(Keys.PAGE_DONW)
time.sleep(0.2)
tweets = browse.find_elements_by_class_name('tweet-text')
for tweet in tweets:
print(tweet.text)
The problem is after running the code, 2 new windows did pop up - one with the seattlepacificuniversity hashtag and the other one named "data" with just blank space and error "Chrome is being controlled by automated software". how can I retrieve only the tweet text for further cleaning and analyzing steps.
Sorry I am so new to the Python world
Upvotes: 0
Views: 1238
Reputation: 1119
I've made some corrections to your code below. I don't know if this is ultimately doing what you intended, but what it is doing is doing a number of page downs and then finding all the tweets and iterating through them printing the text from each. You may need to do more tweaking on your code if this doesn't produce exactly the results you wanted, but it is now working.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
base_url = u'https://twitter.com/search?q='
query = u'seattlepacificuniversity'
url = base_url + query
browser.get(url)
time.sleep(1)
body = browser.find_element_by_tag_name('body')
for _ in range(100):
body.send_keys(Keys.PAGE_DOWN)
time.sleep(0.2)
tweets = browser.find_elements_by_css_selector("[data-testid=\"tweet\"]")
for tweet in tweets:
print(tweet.text)
The reason you had a second browser open is because you had a second line below:
browser = webdriver.Chrome()
In response to your question about gathering all the tweets and printing the texts, I made some code changes which are below.
for _ in range(16):
tweets.extend(browser.find_elements_by_css_selector("[data-testid=\"tweet\"]"))
body.send_keys(Keys.PAGE_DOWN)
time.sleep(1)
tweets = list(dict.fromkeys(tweets)) # removes duplicates
for tweet in tweets:
print(tweet.text)
Since the DOM is loaded with a different set of tweets with each page down, I needed to gather the tweets after each page down and store them in a list. After the loop through the page downs was complete, I removed any duplicate tweets in the list and then iterated through them to print the text. * Note: I changed your for loop to only go 16 times because at the time of me doing this that was the max number of page downs that was needed. Ideally, you have a while loop do this and then find a way to determine when you've reached the end and then break out of the loop.
Upvotes: 1