DoingGreat
DoingGreat

Reputation: 27

Scrape data from table

First, I have tried with bs4, but the table is not plain html text and thats why I moved to selenium

I'm trying to scrape the table data, but I dont know how to get the information.

What I have now is:

table =  browser.find_element_by_id("name_list")  
cell = table.find_elements_by_xpath("//td[@style='text-align:center']")

The table data is displayed like this:

<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>

I want to get "John" but how can I get it?

Upvotes: 0

Views: 61

Answers (2)

furas
furas

Reputation: 142631

You can do it with BeautifulSoup

If you have <script> in <td> then you can use iterator .children and get second/last element (first will be <script>)

from bs4 import BeautifulSoup as BS

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

text = list(td.children)[1]

print(text) # John

or you can find <script> and extract it so you will have <td> only with text

from bs4 import BeautifulSoup as BS

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

td.find('script').extract()
text = td.text

print(td.text) # John

if you need text from Base64.decode("MTA0LjI0OC4xMTUuMjM2") then you can find <script> and get it as text. Using slicing you can get text MTA0LjI0OC4xMTUuMjM2 and decode with module base64. And you get text 104.248.115.236

from bs4 import BeautifulSoup as BS
import base64

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

script = td.find('script').text

text = script[30:-3]

text = base64.b64decode(text).decode()

print(text) # 104.248.115.236

Upvotes: 1

supputuri
supputuri

Reputation: 14135

You can get the text using the below line.

table.find_element_by_xpath(".//td[@style='text-align:center']").text

Make sure the . is there in the xpath to restrict the scope to the current table node.

Upvotes: 0

Related Questions