Reputation: 1043
I haven't been able to find a simple way to do this, i have been following this and I have written the following,
##just comments before this
import lxml,requests
23 page = requests.get('https://finalexams.rutgers.edu.html')
24
25 tree = html.fromstring(page.text)
26
27 tableRow = tree.xpath('//tr/text() ' )
28
29 print 'Rows' , tableRow
That script needs to parse through table rows like these and take out the things inside of them, but there could be a potentially infinite amount of table rows. I don't know how to access nested tags and they don't have unique names or ID's for me to look for.
How can I write a for loop that gets each of these table rows and lets me grab the individual bits of them?
<tr>
<td> 04264</td>
<td>01:198:205</td>
<td>01</td>
<td>INTR DISCRET STRCT I</td>
<td>C</td>
<td>Dec 17, 2014: 8:00 AM - 11:00 AM </td>
</tr>
<tr>
<td> 09907</td>
<td>01:198:214</td>
<td>01</td>
<td>SYSTEMS PROGRAMMING</td>
<td>C</td>
<td>Dec 18, 2014: 8:00 PM - 11:00 PM </td>
</tr>
Upvotes: 0
Views: 164
Reputation: 365717
If you want to find the tr
elements themselves, instead of their (empty) text, just search for the tr
elements, instead of their text:
rows = tree.xpath('//tr')
And then you can iterate them:
for row in rows:
And then you can either search each one for td
elements (e.g., by using row.xpath
, or row.findall
, etc.), or just assume all their children are td
elements (as they happen to be in this case):
for column in row:
And then you can do whatever it is you wanted to do with each column, like extract its text:
print column.text
Upvotes: 3
Reputation: 473863
Iterate over all tr
tags and make an inner loop over td
tags for every row, example:
from lxml.html import fromstring
data = """
your html here
"""
root = fromstring(data)
for index, row in enumerate(root.xpath('//table/tr')):
print "Row #%s" % index
for cell in row.findall('td'):
print cell.text.strip()
print "----"
Prints:
Row #0
04264
01:198:205
01
INTR DISCRET STRCT I
C
Dec 17, 2014: 8:00 AM - 11:00 AM
----
Row #1
09907
01:198:214
01
SYSTEMS PROGRAMMING
C
Dec 18, 2014: 8:00 PM - 11:00 PM
----
Upvotes: 0