Reputation: 757
I am having a html table like this. tried using pandas.read_html and beautifulsoup,. really frustrating, help please!!
here is my original python code:
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=wja&culture=en-CA®ion=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")
table = ksContent.find("table", {'class': "r_table1 text2"})
jsonD = json.dumps(table.text)
jsonL = json.loads(jsonD)
the 'table' will have the html table, but the json conversion makes a pure text.
Upvotes: 2
Views: 5476
Reputation: 1306
This can be solved using python pandas:
first_table = result.find("table")
df = pd.read_html(str(first_table))
with open("./table.json", "a+") as f:
f.write(df[0].to_json(orient='records'))
f.close()
working for me.
Upvotes: 1
Reputation: 2439
jsonD = json.dumps(htmlContent.text)
converts the raw HTML content into a JSON string representation. jsonL = json.loads(jsonD)
parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps()
is reverted by loads()
. jsonL
contains the same data as htmlContent.text
.
Upvotes: 0