Reputation: 1932
I got some json data from an API. I used json.loads and then printed that to the REPL which is shown below.
{'warnings': {'query': {'*': "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}}, 'query-continue': {'links': {'plcontinue': '25618423|10|R_from_other_capitalisation', 'gplcontinue': "15095968|0|1991_US_Open_-_Women's_Doubles"}}, 'query': {'pages': {'32203010': {'pageid': 32203010, 'title': "1988 Australian Open - Women's Doubles", 'ns': 0}, '25618558': {'pageid': 25618558, 'title': "1984 Wimbledon Championships - Women's Singles", 'ns': 0}, '29486043': {'pageid': 29486043, 'title': "1984 Wimbledon Championships - Women's Doubles", 'ns': 0}, '25618819': {'pageid': 25618819, 'title': "1986 US Open - Women's Singles", 'ns': 0}, '25619314': {'pageid': 25619314, 'title': "1989 US Open - Women's Singles", 'ns': 0}, '25618668': {'pageid': 25618668, 'title': "1985 US Open - Women's Singles", 'ns': 0}, '25618857': {'pageid': 25618857, 'title': "1987 Australian Open - Women's Singles", 'ns': 0}, '25618423': {'links': [{'title': "1983 Wimbledon Championships – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}], 'pageid': 25618423, 'title': "1983 Wimbledon Championships - Women's Singles", 'ns': 0}, '23826062': {'links': [{'title': "1984 French Open – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}, {'title': 'Template:R from other capitalisation', 'ns': 10}, {'title': 'Template:R from plural', 'ns': 10}, {'title': 'Template:R from short name', 'ns': 10}, {'title': 'Category:Redirects from modifications', 'ns': 14}], 'pageid': 23826062, 'title': "1984 French Open - Women's Singles", 'ns': 0}, '25619177': {'pageid': 25619177, 'title': "1989 Australian Open - Women's Singles", 'ns': 0}}}}
Then I copied that data from the repl to a .py module and assigned to a variable so I can perform some unit tests. But I keep getting this error:
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
What is going on?
Update: The exact way I got the error. Using Visual Studio I ran a script which grabbed data using Requests and .text to get the content. I then applied json.loads. I printed this to the Visual Studio Python 3.4 Interactive (aka REPL). Then I copied using a mouse from this REPL and pasted into a .py file in Visual Studio.
Update 2: So when I grab the data I use Requests and then the text property. When I print this without json.loads its fine. However if I copy this "more raw" from the REPL and paste not its no longer a string but an object and JSON loads won't work. Does the python 3 print function print objects even though it should be a json?
This is the raw no json.loads output from the API using Requests.text:
{"warnings":{"query":{"*":"Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}},"query-continue":{"links":{"plcontinue":"25618423|10|R_from_other_capitalisation","gplcontinue":"15095968|0|1991_US_Open_-_Women's_Doubles"}},"query":{"pages":{"25618423":{"pageid":25618423,"ns":0,"title":"1983 Wimbledon Championships - Women's Singles","links":[{"ns":0,"title":"1983 Wimbledon Championships \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"}]},"23826062":{"pageid":23826062,"ns":0,"title":"1984 French Open - Women's Singles","links":[{"ns":0,"title":"1984 French Open \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"},{"ns":10,"title":"Template:R from other capitalisation"},{"ns":10,"title":"Template:R from plural"},{"ns":10,"title":"Template:R from short name"},{"ns":14,"title":"Category:Redirects from modifications"}]},"29486043":{"pageid":29486043,"ns":0,"title":"1984 Wimbledon Championships - Women's Doubles"},"25618558":{"pageid":25618558,"ns":0,"title":"1984 Wimbledon Championships - Women's Singles"},"25618668":{"pageid":25618668,"ns":0,"title":"1985 US Open - Women's Singles"},"25618819":{"pageid":25618819,"ns":0,"title":"1986 US Open - Women's Singles"},"25618857":{"pageid":25618857,"ns":0,"title":"1987 Australian Open - Women's Singles"},"32203010":{"pageid":32203010,"ns":0,"title":"1988 Australian Open - Women's Doubles"},"25619177":{"pageid":25619177,"ns":0,"title":"1989 Australian Open - Women's Singles"},"25619314":{"pageid":25619314,"ns":0,"title":"1989 US Open - Women's Singles"}}}}
Upvotes: 2
Views: 10017
Reputation: 177471
There are EN DASH
(U+2013) characters in your text. In the Windows-1252
codec they map to the byte \x96
. You've got encoding problems, but exactly why depends on the steps you took to copy the text to the .py
file. I cut-and-pasted the text in your question into Notepad++ with encoding set to ANSI
and assigned it to a variable and simply got:
File "C:\temp.py", line 1
SyntaxError: unknown decode error
But selecting UTF-8
or UTF-8 without BOM
as the encoding it works correctly. Python 3 assumes UTF-8 if there is no #coding:
comment declaring the source encoding.
Note that ANSI
on my US Windows system is really Windows-1252
. Using ANSI
and adding #coding:windows-1252
also works correctly. Python needs to know the source encoding if it is different from the default (ascii
on Python 2 and utf-8
on Python 3).
Upvotes: 4