Reputation: 7268
pd.read_html
is reading only first 5 rows from (zeroth) table. How to read whole table using pd.read_html
?
I have tried below code:
import pandas as pd
import requests
from urllib.error import HTTPError
try:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)
except HTTPError as http_error:
print("HTTP error: ", http_error)
Upvotes: 1
Views: 684
Reputation: 75150
You are assigning data
as df.head()
which returns the first 5 rows of a dataframe. Instead you can do:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()
Also , pandas is capable to read html directly so you can just do:
data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]
and feed that under your try and except statement.
Outputs:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
Vs
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
.. ... .. .. ... ...
558 559 NaN NaN December 19, 2019 Contacts/Locations and Study Status
559 560 NaN NaN December 20, 2019 Contacts/Locations and Study Status
560 561 NaN NaN December 23, 2019 Contacts/Locations and Study Status
561 562 NaN NaN December 25, 2019 Contacts/Locations and Study Status
562 563 NaN NaN December 27, 2019 Contacts/Locations and Study Status
[563 rows x 5 columns]
Upvotes: 2