natamay462
natamay462

Reputation: 3

Web scraping with selenium to txt

I would scrape ids from this page https://www.flashscore.co.uk/football/russia/premier-league/results/ Then replace g_1_ with https://www.flashscore.com/match/ and import these urls to txt file.

I used this code

matches=WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))

for match in matches:
    g1 = matches.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)

But I got this error

AttributeError: 'list' object has no attribute 'replace'

id that i want to scrape

Upvotes: 0

Views: 341

Answers (2)

chitown88
chitown88

Reputation: 28630

First, as stated in the comments, .replace() is a method to be applied on a string. You have matches, which is a list object (of WebElements) which throws the error 'list' object has no attribute 'replace'' You need to iterate through your list of WebElements, which you did define with for match in matches:, and then grab the id attribute as a string with .get_attribute() in order to use the replace() method.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


#Initializing the webdriver
options = webdriver.ChromeOptions()

#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')

#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe', options=options)
driver.maximize_window()

url = 'https://www.flashscore.co.uk/football/russia/premier-league/results/'
driver.get(url)
matches=WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))

for match in matches:
    g1 = match.get_attribute('id')
    g1 = g1.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)
    
driver.close()

You could also combine that into a one-liner

g1 = match.get_attribute('id').replace("g_1_", "https://www.flashscore.com/match/")

Output:

https://www.flashscore.com/match/hWhb9Uyh
https://www.flashscore.com/match/rLoB6SLA
https://www.flashscore.com/match/zer38lib
https://www.flashscore.com/match/Eos77864
https://www.flashscore.com/match/4zzK46jN
https://www.flashscore.com/match/tdkfAAMo
https://www.flashscore.com/match/MBpF5nyH
https://www.flashscore.com/match/IwvO3Q5T
https://www.flashscore.com/match/nysS6yGg
https://www.flashscore.com/match/f1pz5Fp6
https://www.flashscore.com/match/jTwq3gFI
https://www.flashscore.com/match/QLhJ8cos
https://www.flashscore.com/match/0voW5eVa
https://www.flashscore.com/match/Yiqv4ZaC
https://www.flashscore.com/match/4CiN7H0m
https://www.flashscore.com/match/Sh1CoRqo

Upvotes: 1

undetected Selenium
undetected Selenium

Reputation: 193198

This error message...

AttributeError: 'list' object has no attribute 'replace'

...implies that in your program you have invoked replace() method on a list, where as replace() method replaces a specified phrase with another specified phrase.

You need to invoke replace() method on each of the element's text from the list.


Solution

Instead of collection the elements, you can collect the texts/phrases from the element and create the list. Effectively your code block will be:

match_texts = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))]
for match_text in match_texts:
    g1 = match_text.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)

Upvotes: 2

Related Questions