meme creep
meme creep

Reputation: 33

Python/scrapy nested for/if loop working incorrectly

I'm using scrapy to crawl for a list of users and their SteamID from www.tf2items.com/profiles/.

Currently, my code looks like this:

import scrapy

bot_words = [
"bot",
"BOT",
"[tf2mart]"
]

class AccountSpider(scrapy.Spider):
    name = "accounts"
    start_urls = [  
'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
    ]

def parse(self, response):
    for tr in response.css("tbody"):
        user = response.css("span a").extract()
        print(user)
        if bot_words not in response.css("span a").extract():
            for href in response.css("span a::attr(href)").extract():
                #yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
                print("this is a value")

My final goal is for this code to print something like:

a href="/profiles/76561198042757507">Kchypark

this is a value

a href="/profiles/76561198049853548">Agen Kolar

this is a value

a href="/profiles/76561198036381323">Grave Shifter15

this is a value

With this current code, I could even expect

a href="/profiles/76561198042757507">Kchypark

this is a value

this is a value

this is a value

a href="/profiles/76561198049853548">Agen Kolar

this is a value

this is a value

this is a value

a href="/profiles/76561198036381323">Grave Shifter15

this is a value

this is a value

this is a value

However, I get:

a href="/profiles/76561198042757507">Kchypark

a href="/profiles/76561198049853548">Agen Kolar

a href="/profiles/76561198036381323">Grave Shifter15

this is a value

this is a value

this is a value

What am I doing wrong?

Upvotes: 1

Views: 164

Answers (1)

Danil
Danil

Reputation: 5181

your first print outputing a list of hrefs

user = response.css("span a").extract()
print(user)

your code should looks like

def parse(self, response):
    for tr in response.css("tbody"):
        for user in response.css("span a"):
            if bot_words not in user:
                print(user.extract())
                href = user.css('::attr(href)').extract()[0]
                print(href)
                #yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
                print("this is a value")

Also, Best practice in srapy is using of items instead of raw print function.

And be aware of code duplication like response.css("span a").extract()

Upvotes: 1

Related Questions