user137717
user137717

Reputation: 2165

XPATH Not Extracting Tables From HTML Python

I am trying to extract tables from an HTML document using the xpath module in Python. If I print the downloaded HTML, I see the full DOM as it should be. However, when I use xpath.get, it give me a tbody section, but not the one I want and certainly not the only one that should be there. Here is the script.

import requests
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)

I'm new to this, so idk if it matters, but the type of each element in 'data' is str.

Upvotes: 0

Views: 96

Answers (1)

Prashant Puri
Prashant Puri

Reputation: 2334

Convert type of data to dict using json.loads()

Try this,

import requests
import json
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)
json_data = json.loads(data)
print type(json_data[0])

Upvotes: 2

Related Questions