Anoop D
Anoop D

Reputation: 1850

Scraping non formatted json

My online json file is like

data(
      [
        {
           "CCODE": "15ET",
            "CNAME": "JOE",
            "CAGE": 32
        },{
           "CCODE": "15ET",
            "CNAME": "JOE",
            "CAGE": 32
        },{
           "CCODE": "15ET",
            "CNAME": "JOE",
            "CAGE": 32
        }
      ]
    )

I am trying to scrap it using scrapy , but the code json.loads(response.body_as_unicode()) is giving JSONDecodeError: Expecting value as the json format is incorrect . Is there any workarounds for this problem .

Upvotes: 0

Views: 120

Answers (1)

gangabass
gangabass

Reputation: 10666

You need to use regular expression to clear it from some JS stuff first and next you can use json.loads():

json_str = re.search( r'data\((.+)\)$', response.body, flags=re.DOTALL).group(1)
data = json.loads(json_str)

UPDATE For Python 3 you need something like this:

json_str = re.search( r'data\((.+)\)$', response.text, flags=re.DOTALL).group(1)
data = json.loads(json_str)

Upvotes: 1

Related Questions