Reputation: 4142
I have a simple function returning the contents of a table retrieved via xpath from a website:
import traceback
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
def get_hotbird_13e():
# Downloads an up to date channel/country map for Hotbird 13e.
try:
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("http://www.eutelsat.com/deploy_tvLineUp/struts/advancedSearch.do?orbitalPositionId=13%B0%20EAST&Langue=EN")
link_xpath = '/html/body/div[1]/div[3]/div/table'
link_path = driver.find_element_by_xpath(link_xpath).text
driver.quit()
print(link_path)
except Exception as exc:
print(traceback.format_exc())
get_hotbird_13e()
...this however returns all elements of the table in xpath with space used a separator. As some of the field values have spaces in them I cannot access the field values individually.
What do I need to amend in my code so that an example output of:
TVN TURBO TVN 13° EAST HOTBIRD 13C POLISH HD CONAX / IRDETO / MEDIAGUARD / NAGRAVISION / VIACCESS
...becomes:
TVN TURBO, TVN, 13° EAST, HOTBIRD 13C, POLISH, HD, CONAX / IRDETO / MEDIAGUARD / NAGRAVISION / VIACCESS
Thanks
Upvotes: 1
Views: 1026
Reputation: 5915
XPath 2.0 one liner solution :
tokenize(replace(replace(substring-after(normalize-space(string-join(//tr//text()[normalize-space()]|//tr[@class]/@class,",")),",")," ?, ?",","),"oneven","even"),",even,")
Output :
String='112 UKRAÏNA,Globecast,13° EAST,HOTBIRD 13C,UKRAINIAN,HD,CLEAR'
String='13 ULICA,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,HD,CONAX / IRDETO / MEDIAGUARD / NAGRAVISION / VIACCESS'
String='20 MEDIASET,Mediaset,13° EAST,HOTBIRD 13C,ITALIAN,SD,NAGRAVISION / VIDEOGUARD'
String='20 MEDIASET,Mediaset,13° EAST,HOTBIRD 13E,ITALIAN,HD,NAGRAVISION / VIDEOGUARD'
String='2M MONDE,Globecast,13° EAST,HOTBIRD 13B,ARABIC,SD,CLEAR,GENERAL'
String='2M MONDE,Globecast,13° EAST,HOTBIRD 13C,ARABIC,SD,CLEAR,GENERAL'
String='4 FUN DANCE,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC'
String='4 FUN GOLD,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC'
String='4 FUN TV,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC'
...
Upvotes: 1
Reputation: 4177
Another solution :
contents=WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='listresult']//tr[*]")))
for item in contents:
print item.text
Note : please add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
Upvotes: 1
Reputation: 33384
Fetch the data row wise and then get all columns value in list and then join with ","
Code:
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("http://www.eutelsat.com/deploy_tvLineUp/struts/advancedSearch.do?orbitalPositionId=13%B0%20EAST&Langue=EN")
WebDriverWait(driver,15).until(EC.presence_of_element_located((By.CSS_SELECTOR,".listresult")))
for row in driver.find_elements_by_xpath("//table[@class='listresult']//tr")[1:]:
rowwisedata=[td.text.strip() for td in row.find_elements_by_xpath(".//td") if td.text!=""]
print(','.join(rowwisedata))
Output:
112 UKRAÏNA,Globecast,13° EAST,HOTBIRD 13C,UKRAINIAN,HD,CLEAR
13 ULICA,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,HD,CONAX / IRDETO / MEDIAGUARD / NAGRAVISION / VIACCESS
20 MEDIASET,Mediaset,13° EAST,HOTBIRD 13C,ITALIAN,SD,NAGRAVISION / VIDEOGUARD
20 MEDIASET,Mediaset,13° EAST,HOTBIRD 13E,ITALIAN,HD,NAGRAVISION / VIDEOGUARD
2M MONDE,Globecast,13° EAST,HOTBIRD 13B,ARABIC,SD,CLEAR,GENERAL
2M MONDE,Globecast,13° EAST,HOTBIRD 13C,ARABIC,SD,CLEAR,GENERAL
4 FUN DANCE,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC
4 FUN GOLD,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC
4 FUN TV,Cyfrowy Polsat,13° EAST,HOTBIRD 13C,POLISH,SD,CLEAR,MUSIC
6TER,Bis TV,13° EAST,HOTBIRD 13B,FRENCH,SD,VIACCESS
And so on....
Upvotes: 1
Reputation: 1938
if you like to get each cell within the table seperated, you have to use an xpath pointing to each cell. try this approach,
link_xpath = '/html/body/div[1]/div[3]/div/table//tr/td'
cells = driver.find_elements_by_xpath(link_xpath)
for cell in cells:
print(cell.text)
Upvotes: 1