How to use Python to extract data from the Met Office JSON download

I am using Python 3.4.

I have started a project to download the UK Met Office Forecast data (in JSON format) and use the information as a weather compensator for my home heating system. I have succeeded in downloading the JSON datafile from the MET Office, and now I want to extract the info I need. I can do this by converting the file to a string and using .find and .int methods to extract the data, but this seems crude (but effective). As JSON is said to be a well-used data interchange format, there must be a better way to do this. I have found things like json.load and json.loads, and also json.JSONDecoder.decode but I haven't had any success in using these, and I really have little idea of what I am doing!

My code is:

import urllib.request
import json

#Comment:  THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
#Comment:  **** = my personal met office API key, which I had better keep to myself

response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/354037?res=3hourly&key=****')

FCData    = response.read()
FCDataStr = str(FCData)

#Comment:   END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET
#Comment:   Example of data extraction

ChPos = FCDataStr.find('"DV"')      #Find "DV"    
ChPos = FCDataStr.find('"dataDate"', ChPos, ChPos+50)      #Find "dataDate"

FileDataDate = FCDataStr[ChPos+12:ChPos+22]                #Extract the date of the file

#Comment:   And so on

When using json.loads(FCDataStr) I get the following error message:

"ValueError: Expecting value: line 1 column 1 (char 0)"

By deleting the b' at the start and the ' at the end, this error goes away (see below). Printing the JSON file in string format, using print(FCDataStr) gives:

b'{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2014-07-29T20:00:00Z","type":"Forecast","Location":{"i":"354037","lat":"51.7049","lon":"-2.9022","name":"USK","country":"WALES","continent":"EUROPE","elevation":"43.0","Period":[{"type":"Day","value":"2014-07-29Z","Rep":[{"D":"NNW","F":"22","G":"11","H":"51","Pp":"4","S":"9","T":"24","V":"VG","W":"7","U":"7","$":"900"},{"D":"NW","F":"19","G":"16","H":"61","Pp":"8","S":"11","T":"22","V":"EX","W":"8","U":"1","$":"1080"},{"D":"NW","F":"16","G":"20","H":"70","Pp":"1","S":"11","T":"18","V":"VG","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-30Z","Rep":[{"D":"NW","F":"13","G":"16","H":"84","Pp":"0","S":"7","T":"14","V":"VG","W":"0","U":"0","$":"0"},{"D":"WNW","F":"12","G":"13","H":"90","Pp":"0","S":"7","T":"13","V":"VG","W":"0","U":"0","$":"180"},{"D":"WNW","F":"13","G":"11","H":"87","Pp":"0","S":"7","T":"14","V":"GO","W":"1","U":"1","$":"360"},{"D":"SW","F":"18","G":"9","H":"67","Pp":"0","S":"4","T":"19","V":"VG","W":"1","U":"2","$":"540"},{"D":"WNW","F":"21","G":"13","H":"56","Pp":"0","S":"9","T":"22","V":"VG","W":"3","U":"6","$":"720"},{"D":"W","F":"21","G":"20","H":"55","Pp":"0","S":"11","T":"23","V":"VG","W":"3","U":"6","$":"900"},{"D":"W","F":"18","G":"22","H":"57","Pp":"0","S":"11","T":"21","V":"VG","W":"1","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"13","H":"80","Pp":"0","S":"7","T":"16","V":"VG","W":"0","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-31Z","Rep":[{"D":"SW","F":"14","G":"11","H":"91","Pp":"0","S":"4","T":"15","V":"GO","W":"0","U":"0","$":"0"},{"D":"SW","F":"14","G":"11","H":"92","Pp":"0","S":"4","T":"14","V":"GO","W":"0","U":"0","$":"180"},{"D":"SW","F":"15","G":"11","H":"89","Pp":"3","S":"7","T":"16","V":"GO","W":"3","U":"1","$":"360"},{"D":"WSW","F":"17","G":"20","H":"79","Pp":"28","S":"11","T":"18","V":"GO","W":"3","U":"2","$":"540"},{"D":"WSW","F":"18","G":"22","H":"72","Pp":"34","S":"11","T":"20","V":"GO","W":"10","U":"5","$":"720"},{"D":"WSW","F":"18","G":"22","H":"66","Pp":"13","S":"11","T":"20","V":"VG","W":"7","U":"5","$":"900"},{"D":"WSW","F":"17","G":"22","H":"69","Pp":"36","S":"11","T":"19","V":"VG","W":"10","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"16","H":"84","Pp":"6","S":"9","T":"17","V":"GO","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-01Z","Rep":[{"D":"SW","F":"16","G":"13","H":"91","Pp":"4","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"0"},{"D":"SW","F":"15","G":"11","H":"93","Pp":"5","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"180"},{"D":"SSW","F":"15","G":"11","H":"93","Pp":"7","S":"7","T":"16","V":"GO","W":"7","U":"1","$":"360"},{"D":"SSW","F":"17","G":"18","H":"79","Pp":"14","S":"9","T":"18","V":"GO","W":"7","U":"2","$":"540"},{"D":"SSW","F":"17","G":"22","H":"74","Pp":"43","S":"11","T":"19","V":"GO","W":"10","U":"5","$":"720"},{"D":"SW","F":"16","G":"22","H":"81","Pp":"48","S":"11","T":"18","V":"GO","W":"10","U":"5","$":"900"},{"D":"SW","F":"16","G":"18","H":"80","Pp":"55","S":"9","T":"17","V":"GO","W":"12","U":"1","$":"1080"},{"D":"SSW","F":"15","G":"16","H":"89","Pp":"38","S":"7","T":"16","V":"GO","W":"9","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-02Z","Rep":[{"D":"S","F":"14","G":"11","H":"94","Pp":"15","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"0"},{"D":"SSE","F":"14","G":"11","H":"94","Pp":"16","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"180"},{"D":"S","F":"14","G":"13","H":"93","Pp":"36","S":"7","T":"15","V":"GO","W":"10","U":"1","$":"360"},{"D":"S","F":"15","G":"20","H":"84","Pp":"62","S":"11","T":"17","V":"GO","W":"14","U":"2","$":"540"},{"D":"SSW","F":"16","G":"22","H":"78","Pp":"63","S":"11","T":"18","V":"GO","W":"14","U":"5","$":"720"},{"D":"WSW","F":"16","G":"27","H":"66","Pp":"59","S":"13","T":"19","V":"VG","W":"14","U":"5","$":"900"},{"D":"WSW","F":"15","G":"25","H":"68","Pp":"39","S":"13","T":"18","V":"VG","W":"10","U":"2","$":"1080"},{"D":"SW","F":"14","G":"16","H":"80","Pp":"28","S":"9","T":"15","V":"VG","W":"0","U":"0","$":"1260"}]}]}}}}'

The result of using:

DecodedJSON = json.loads(FCDataStr)
print(DecodedJSON)

gives a very similar result to the original FCDataStr file.

How do I proceed to extract the data (such as temperature, wind speed etc for each 3 hourly forecast) from the file?

Upvotes: 0

Views: 3228

Answers (3)

SianiAnni
SianiAnni

Reputation: 1

I been at parsing the Met Office datapoint output.

Thanks to the response above I have something that works for me.

I am writing the data I am interested in to a CSV file:

import sys
import os
import urllib.request
import json

###  THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxobs/all/json/3351?res=hourly&?key=<my key>')
FCData = response.read()
FCDataStr = FCData.decode('utf-8')
###   END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET

#Converts JSON data to a dictionary object
FCData_Dic = json.loads(FCDataStr)

# Open output file for appending
fName=<my filename>
if (not os.path.exists(fName)):
    print(fName,' does not exist')
    exit()
fOut=open(fName, 'a')

# Loop through each day, will nearly always be 2 days,
# unless run at midnight. 
i = 0
j = 0
for k in range(24):
    # there will be 24 values altogether
    # find the first hour value for the first day
    DateZ = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['value'])
    hhmm = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]  ['$'])
    Temperature = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['T'])
    Humidity = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['H'])
    DewPoint = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['Dp'])
    recordStr = '{},{},{},{},{}\n'.format(DateZ,hhmm,Temperature,Humidity,DewPoint)
    fOut.write(recordStr)
    j = j + 1
    if (hhmm == '1380'):
        i = i + 1
        j = 0
fOut.close()
print('Records added to ',fName)`

Upvotes: 0

For other clueless people who may want to use the UK Met Office 3-hourly forecast data feed, below is the solution that I am using:

import urllib.request
import json

###  THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/**YourLocationID**?res=3hourly&key=**your_api_key**')
FCData = response.read()
FCDataStr = FCData.decode('utf-8')
###   END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET

#Converts JSON data to a dictionary object
FCData_Dic = json.loads(FCDataStr)

#The following are examples of extracting data from the dictionary object.
#The JSON data is heavily nested.
#Each [] goes one level down, usually defined with {} in the JSON data.
dataDate = (FCData_Dic['SiteRep']['DV']['dataDate'])
print('dataDate =',dataDate)

#There are also [] in the JSON data, which are referenced with integers, 
# starting from [0]
#Here, the [0] refers to the first day's block of data defined with [].
DateDay0 = (FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['value'])
print('DateDay0 =',DateDay0)

#The second [0] picks out each of the first day's forecast data, in this case the time, referenced by '$'
TimeOfFC = (FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['$'])
print('TimeOfFC =',TimeOfFC)

#Ditto for the temperature.    
Temperature = int((FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['T']))
print('Temperature =',Temperature)

#Ditto for the weather Type (a code number).
WeatherType = int((FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['W']))
print('WeatherType =',WeatherType)

I hope this helps somebody!

Upvotes: 2

abarnert
abarnert

Reputation: 365995

This is the problem:

FCDataStr = str(FCData)

When you call str on a bytes object, what you get is the string representation of a bytes object—in quotes, with a b prefix, and with special characters backslash-escaped.

If you wanted to decode the binary data to text, you have to do that with the decode method:

FCDataStr = FCData.decode('utf-8')

(I'm guessing UTF-8 because JSON is always supposed to be in UTF-8 unless otherwise specified.)


In more detail:

urllib.request.urlopen returns an http.client.HTTPResponse, which is a binary file-like object, (which implements io.RawIOBase).

You can't pass that to json.load because it wants a text-file-like object—something with a read method that returns str, not bytes. You could wrap your HTTPResponse in an io.BufferedReader, then wrap than in an io.TextIOBase (with encoding='utf-8'), then pass that to json.load, but that's probably more work than you want to do.

So, the simplest thing to do is exactly what you were trying to do, just using decode instead of str:

data_bytes = response.read() data_str = data_bytes.decode('utf-8') data_dict = json.loads(data_str)


Then, don't try to access the data in data_str—that's just a string, representing the JSON encoding of your data; data_dict is the actual data.

For example, to find the dataDate of the DV of the SiteRep, you just do this:

data_dict['SiteRep']['DV']['DataDate']

That will get you the string '2014-07-31T14:00:00Z'. You'll still probably want to convert to that to a datetime.datetime object (because JSON only understands a few basic types: strings, numbers, lists, and dicts). But it's still a lot better than trying to pick it out of data_str by find-ing or guessing at the offsets.


My guess is that you've found some sample code written for Python 2.x, where you can convert between byte strings and Unicode strings just by calling the appropriate constructors, without specifying an encoding, which would default to sys.getdefaultencoding(), and often (at least on Mac or most modern Linux distros) that's UTF-8, so it just happened to work despite being wrong. In which case you may want to find some better sample code to learn from…

Upvotes: 0

Related Questions