Reputation: 33
I'm using scrapy to crawl for a list of users and their SteamID from www.tf2items.com/profiles/.
Currently, my code looks like this:
import scrapy
bot_words = [
"bot",
"BOT",
"[tf2mart]"
]
class AccountSpider(scrapy.Spider):
name = "accounts"
start_urls = [
'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
]
def parse(self, response):
for tr in response.css("tbody"):
user = response.css("span a").extract()
print(user)
if bot_words not in response.css("span a").extract():
for href in response.css("span a::attr(href)").extract():
#yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
print("this is a value")
My final goal is for this code to print something like:
a href="/profiles/76561198042757507">Kchypark
this is a value
a href="/profiles/76561198049853548">Agen Kolar
this is a value
a href="/profiles/76561198036381323">Grave Shifter15
this is a value
With this current code, I could even expect
a href="/profiles/76561198042757507">Kchypark
this is a value
this is a value
this is a value
a href="/profiles/76561198049853548">Agen Kolar
this is a value
this is a value
this is a value
a href="/profiles/76561198036381323">Grave Shifter15
this is a value
this is a value
this is a value
However, I get:
a href="/profiles/76561198042757507">Kchypark
a href="/profiles/76561198049853548">Agen Kolar
a href="/profiles/76561198036381323">Grave Shifter15
this is a value
this is a value
this is a value
What am I doing wrong?
Upvotes: 1
Views: 164
Reputation: 5181
your first print outputing a list of href
s
user = response.css("span a").extract()
print(user)
your code should looks like
def parse(self, response):
for tr in response.css("tbody"):
for user in response.css("span a"):
if bot_words not in user:
print(user.extract())
href = user.css('::attr(href)').extract()[0]
print(href)
#yield response.follow("http://www.backpack.tf" + href, self.parse_accounts)
print("this is a value")
Also, Best practice in srapy is using of items instead of raw print
function.
And be aware of code duplication like response.css("span a").extract()
Upvotes: 1