Colin Zhong
Colin Zhong

Reputation: 757

Python convert HTML table to json

I am having a html table like this. tried using pandas.read_html and beautifulsoup,. really frustrating, help please!!

here is my original python code:

url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=wja&culture=en-CA&region=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")
table = ksContent.find("table", {'class': "r_table1 text2"})
jsonD = json.dumps(table.text)
jsonL = json.loads(jsonD)

the 'table' will have the html table, but the json conversion makes a pure text.

Upvotes: 2

Views: 5476

Answers (2)

water_ak47
water_ak47

Reputation: 1306

This can be solved using python pandas:

first_table = result.find("table")
df = pd.read_html(str(first_table))

with open("./table.json", "a+") as f:
    f.write(df[0].to_json(orient='records'))
    f.close()

working for me.

Upvotes: 1

Shane Fontaine
Shane Fontaine

Reputation: 2439

jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON string representation. jsonL = json.loads(jsonD) parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

Upvotes: 0

Related Questions