Harsha Biyani
Harsha Biyani

Reputation: 7268

python pandas read HTML table

pd.read_html is reading only first 5 rows from (zeroth) table. How to read whole table using pd.read_html?

I have tried below code:

import pandas as pd
import requests
from urllib.error import HTTPError

try:
    url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
    html_data2 = requests.get(url)
    df = pd.read_html(html_data2.text)[0]
    data = df.head()
    print(data)
except HTTPError as http_error:
    print("HTTP error: ", http_error)

Upvotes: 1

Views: 684

Answers (1)

anky
anky

Reputation: 75150

You are assigning data as df.head() which returns the first 5 rows of a dataframe. Instead you can do:

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()

Also , pandas is capable to read html directly so you can just do:

data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]

and feed that under your try and except statement.

Outputs:

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)

   Version   A   B     Submitted Date                               Changes
0        1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1        2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2        3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3        4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4        5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status

Vs

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)

     Version   A   B     Submitted Date                               Changes
0          1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1          2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2          3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3          4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4          5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status
..       ...  ..  ..                ...                                   ...
558      559 NaN NaN  December 19, 2019   Contacts/Locations and Study Status
559      560 NaN NaN  December 20, 2019   Contacts/Locations and Study Status
560      561 NaN NaN  December 23, 2019   Contacts/Locations and Study Status
561      562 NaN NaN  December 25, 2019   Contacts/Locations and Study Status
562      563 NaN NaN  December 27, 2019   Contacts/Locations and Study Status

[563 rows x 5 columns]

Upvotes: 2

Related Questions