Get desired value from this with json.dumps()

Question

I am still trying to get my head around json.loads and json.dumps to extract what I want from a web page. I am after some data from this link that takes the format of:

data:{
                url: 'stage-player-stat'
            },
            defaultParams: {
                stageId: 9155,
                teamId: 32,
                playerId: -1,
                field: 2
            },

The code I am using is this:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.cmdline import execute
from scrapy.utils.markup import remove_tags
import time
import re
import json
import requests

class ExampleSpider(CrawlSpider):
    name = "goal2"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/Teams/32/"]

    rules = [Rule(SgmlLinkExtractor(allow=('\Teams'),deny=(),), follow=False, callback='parse_item')]

    def parse_item(self, response):

        stagematch = re.compile("data:\s*{\s*url:\s*'stage-player-stat'\s*},\s*defaultParams:\s*{\s*(.*?),.*},",re.S)

        stagematch2 = re.search(stagematch, response.body)

        if stagematch2 is not None:
            stagematch3 = stagematch2.group(1)


            stageid = json.dumps(stagematch3)

            print "stageid = ", stageid

    execute(['scrapy','crawl','goal2'])

In this example, stageId resolves to "stageId: 9155". What I want it to resolve to though is 9155. I have tried to parse stageId with stageid = stageid[0] as if it is a dictionary, but this is not working. What am I doing wrong?

Thanks

pts · Accepted Answer

stagematch3 = stagematch2.group(1)
stageid = int(stagematch3.split(':', 1)[1])

Then you may convert it back to str if you wish:

stageid = str(stageid)

There are many other ways to solve your problem. One of them is using a simpler regexp and then parsing the match object with json.loads.

Get desired value from this with json.dumps()

Answers (2)

Related Questions