Slang I'mma talk
Slang I'mma talk

Reputation: 63

how do you extract data from json using beautifulsoup in django

Good day. I'm facing an issue while trying to extract values from json. First of all my beautifulsoup works very fine in the shell, but not in django. and also what I'm trying to achieve is extracting data from the received json, but with no success. Here's the class in my view doing it:

class FetchWeather(generic.TemplateView):
    template_name = 'forecastApp/pages/weather.html'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        url = 'http://weather.news24.com/sa/cape-town'
        city = 'cape town'
        url_request = requests.get(url)
        soup = BeautifulSoup(url_request.content, 'html.parser')
        city_list = soup.find(id="ctl00_WeatherContentHolder_ddlCity")
        print(soup.head)
        city_as_on_website = city_list.find(text=re.compile(city, re.I)).parent
        cityId = city_as_on_website['value']
        json_url = "http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"

        headers = {
            'Content-Type': 'text/plain; charset=UTF-8',
            'Host': 'weather.news24.com',
            'Origin': 'http://weather.news24.com',
            'Referer': url,
            'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
            'X-AjaxPro-Method': 'GetCurrentOne'}

        payload = {
            "cityId": cityId
        }
        request_post = requests.post(json_url, headers=headers, data=json.dumps(payload))
        print(request_post.content)
        context['Observations'] = request_post.content
        return context

In the json, there's a array "Observations" from which I'm trying to get the city name, the temperature high and low.

but when I tried to do this:

cityDict = json.loads(str(html))

I'm receiving an error. Here's the traceback to it:

 Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 4067 (char 4066)

any help will be gladly appreciated.

Upvotes: 1

Views: 331

Answers (1)

alecxe
alecxe

Reputation: 473863

There are two problems with your JSON data inside request_post.content:

  • there are JS date object values there, for instance:

    "Date":new Date(Date.UTC(2016,1,26,22,0,0,0))
    
  • there are unwanted characters at the end: ;/*".

Let's clean the JSON data so that it can be loaded with json:

from datetime import datetime

data = request_post.text

def convert_date(match):
    return '"' + datetime(*map(int, match.groups())).strftime("%Y-%m-%dT%H:%M:%S") + '"'

data = re.sub(r"new Date\(Date\.UTC\((\d+),(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)\)\)",
              convert_date,
              data)

data = data.strip(";/*")
data = json.loads(data)

context['Observations'] = data

Upvotes: 1

Related Questions