Reputation: 428
I'm trying to get the contents of an HTML table using XPaths, I'm using Mechanicalsoup to grab the form and submit it (The data is behind a submission form) once I hit the second form I grab the URL and pass it for parsing but I'm getting AttributeError: 'list' object has no attribute 'xpath'
import mechanicalsoup
import requests
from lxml import html
from lxml import etree
#This Will Use Mechanical Soup to grab the Form, Subit it and find the Data Table
browser = mechanicalsoup.StatefulBrowser()
winnet = "http://winnet.wartburg.edu/coursefinder/"
browser.open(winnet)
Searchform = browser.select_form()
Searchform.choose_submit('ctl00$ContentPlaceHolder1$FormView1$Button_FindNow')
response1 = browser.submit_selected() #This Progresses to Second Form
dataURL = 'https://winnet.wartburg.edu/coursefinder/Results.aspx' #Get URL of Second Form w/ Data
pageContent=requests.get(dataURL)
tree = html.fromstring(pageContent.content)
dataTable = tree.xpath('//*[@id="ctl00_ContentPlaceHolder1_GridView1"]')
print(dataTable)
for row in dataTable.xpath(".//tr")[1:]:
print([cell.text_content() for cell in row.xpath(".//td")])
#XPath to Table
#//*[@id="ctl00_ContentPlaceHolder1_GridView1"]
I'd post the HTML I'm trying to parse but it is incredibly long and from what I've seen of some other sites I've worked with it is incredibly sloppily written
Upvotes: -1
Views: 137
Reputation: 24940
I'm not sure, but I believe you are after something like this. If that's not it, you can probably modify it to get you where you want to be.
import pandas as pd
rows = [] #initialize a collection of rows
for row in dataTable[0].xpath(".//tr")[1:]: #add new rows to the collection
rows.append([cell.text_content().strip() for cell in row.xpath(".//td")])
df = pd.DataFrame(rows) #load the collection to a dataframe
df
Output (pardon the formatting):
View Details AC 121 01 Principles of Accounting I Pilcher, A M W F 10:45AM-11:50AM 45/40/0 WBC 116 2019-20 WI 1.00
View Details AC 122 01 Principles of Accounting II Pilcher, A MWF 12:00PM-1:05PM 45/42/0 WBC 116 2019-20 WI 1.00
etc.
Upvotes: 1