Reputation: 21
So I'm using pandas.read_html to try to get a table from a website. For some reason it's not giving me the entire table and it's just getting the header row. How can I fix this?
Code:
import pandas as pd
term_codes = {"fall":"10", "spring":"20", "summer":"30"}
# year must be last number in school year: 2021-2022 so we pick 2022
year = "2022"
department = "CSCI"
term_code = year + term_codes["fall"]
url = "https://courselist.wm.edu/courselist/courseinfo/searchresults?term_code=" + term_code + "&term_subj=" + department + "&attr=0&attr2=0&levl=0&status=0&ptrm=0&search=Search"
def findCourseTable():
dfs = pd.read_html(url)
print(dfs[0])
#df = dfs[1]
#df.to_csv(r'courses.csv', index=False)
if __name__ == "__main__":
findCourseTable()
Output:
Empty DataFrame
Columns: [CRN, COURSE ID, CRSE ATTR, TITLE, INSTRUCTOR, CRDT HRS, MEET DAY:TIME, PROJ ENR, CURR ENR, SEATS AVAIL, STATUS]
Index: []
Upvotes: 1
Views: 1018
Reputation: 195418
The page contains malformed HTML code, so use flavor="html5lib"
in pd.read_html
to read it correctly:
import pandas as pd
term_codes = {"fall": "10", "spring": "20", "summer": "30"}
# year must be last number in school year: 2021-2022 so we pick 2022
year = "2022"
department = "CSCI"
term_code = year + term_codes["fall"]
url = (
"https://courselist.wm.edu/courselist/courseinfo/searchresults?term_code="
+ term_code
+ "&term_subj="
+ department
+ "&attr=0&attr2=0&levl=0&status=0&ptrm=0&search=Search"
)
df = pd.read_html(url, flavor="html5lib")[0]
print(df)
Prints:
CRN COURSE ID CRSE ATTR TITLE INSTRUCTOR CRDT HRS MEET DAY:TIME PROJ ENR CURR ENR SEATS AVAIL STATUS
0 16064 CSCI 100 01 C100, NEW Reading@Russia Willner, Dana; Prokhorova, Elena 4 MWF:1300-1350 10 10 0* CLOSED
1 14614 CSCI 120 01 NaN A Career in CS? And Which One? Kemper, Peter 1 M:1700-1750 36 20 16 OPEN
2 16325 CSCI 120 02 NEW Concepts in Computer Science Deverick, James 3 TR:0800-0920 36 25 11 OPEN
3 12372 CSCI 140 01 NEW, NQR Programming for Data Science Khargonkar, Arohi 4 MWF:0900-0950 36 24 12 OPEN
4 14620 CSCI 140 02 NEW, NQR Programming for Data Science Khargonkar, Arohi 4 MWF:1100-1150 36 27 9 OPEN
5 13553 CSCI 140 03 NEW, NQR Programming for Data Science Khargonkar, Arohi 4 MWF:1300-1350 36 25 11 OPEN
...and so on.
Upvotes: 3