Reputation: 2165
I am trying to extract tables from an HTML document using the xpath module in Python. If I print the downloaded HTML, I see the full DOM as it should be. However, when I use xpath.get, it give me a tbody section, but not the one I want and certainly not the only one that should be there. Here is the script.
import requests
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)
I'm new to this, so idk if it matters, but the type of each element in 'data' is str.
Upvotes: 0
Views: 96
Reputation: 2334
Convert type of data to dict using json.loads()
Try this,
import requests
import json
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)
json_data = json.loads(data)
print type(json_data[0])
Upvotes: 2