meowsephine
meowsephine

Reputation: 392

Loop iterating through all rows instead of iterating through each row separately

So here is the table that I am trying to get data from

<table class="statBlock" cellspacing="0">
<tr>
    <th>
        <a href="/srd/magicOverview/spellDescriptions.htm#level">Level</a>:
    </th>
    <td>
        <a href="/srd/spellLists/clericSpells.htm#thirdLevelClericSpells">Clr 3</a>
    </td>
</tr>
<tr>
    <th>
        <a href="/srd/magicOverview/spellDescriptions.htm#components">Components</a>:
    </th>
    <td>
        V, S
    </td>
</tr>
<tr>
    <th>
        <a href="/srd/magicOverview/spellDescriptions.htm#castingTime">Casting Time</a>:
    </th>
    <td>
        1 <a href="/srd/combat/actionsInCombat.htm#standardActions">standard action</a>
    </td>
</tr>

ETC...

This is the scrapy code that I have so far for parsing

        for sel in response.xpath('//tr'):
        string = " ".join(response.xpath('//th/a/text()').extract()) + ":" + " ".join(response.xpath('//td/text()').extract())
        print string

But this yields a result like this:

Level Components Casting Time Range Effect Duration Saving Throw Spell Resistance:V, S, M, XP 12 hours 0 ft. One duplicate creature Instantaneous None No

When the output should look something like

Level: CLR 1  Components:V, S, M etc...

Essentially, for some reason it isn't looping through each row of the table and finding the one and cell for each and sticking them together, it's finding all of the data from and all of the data from and then sticking those two sets together. I assume my for statement needs to be fixed - how do I go about getting it to examine each row individually?

Upvotes: 1

Views: 772

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 90999

When you query an xpath like -

response.xpath('//th/a/text()')

This would return all the <th> elements with <a> elements in them (that have a text() ) . That is not what you want . You should do -

for sel in response.xpath('//tr'):
    string = " ".join(sel.xpath('.//th/a/text()').extract()) + ":" + " ".join(sel.xpath('.//td/text()').extract())
    print string

The dot in the xpath inside the loop, is so that xpath is run relative to the current node, not from the starting node.

More details on relative xpaths at Working with Relative XPaths

Upvotes: 2

Related Questions