Reputation: 27
First, I have tried with bs4, but the table is not plain html text and thats why I moved to selenium
I'm trying to scrape the table data, but I dont know how to get the information.
What I have now is:
table = browser.find_element_by_id("name_list")
cell = table.find_elements_by_xpath("//td[@style='text-align:center']")
The table data is displayed like this:
<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>
I want to get "John" but how can I get it?
Upvotes: 0
Views: 61
Reputation: 142631
You can do it with BeautifulSoup
If you have <script>
in <td>
then you can use iterator .children
and get second/last element (first will be <script>
)
from bs4 import BeautifulSoup as BS
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
text = list(td.children)[1]
print(text) # John
or you can find <script>
and extract
it so you will have <td>
only with text
from bs4 import BeautifulSoup as BS
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
td.find('script').extract()
text = td.text
print(td.text) # John
if you need text from Base64.decode("MTA0LjI0OC4xMTUuMjM2")
then you can find <script>
and get it as text. Using slicing you can get text MTA0LjI0OC4xMTUuMjM2
and decode with module base64
. And you get text 104.248.115.236
from bs4 import BeautifulSoup as BS
import base64
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
script = td.find('script').text
text = script[30:-3]
text = base64.b64decode(text).decode()
print(text) # 104.248.115.236
Upvotes: 1
Reputation: 14135
You can get the text using the below line.
table.find_element_by_xpath(".//td[@style='text-align:center']").text
Make sure the . is there in the xpath to restrict the scope to the current table node.
Upvotes: 0