Reputation: 1587
I am having a hard time trying to convert a JSON string as shown below to CSV using Pandas.
Here is my example string (it could also be read from a file):
{
"count": 8,
"facets": [],
"results": [
{
"protocol": "DWC_ARCHIVE",
"taxonKey": 4332928,
"family": "Diaptomidae",
"institutionCode": "MNHN",
"lastInterpreted": "2017-05-17T13:20:23.744+0000",
"speciesKey": 4332928,
"gbifID": "694182141",
"identifiedBy": "Dussart B.",
"lastParsed": "2017-05-17T13:19:47.003+0000",
"phylum": "Arthropoda",
"orderKey": 679,
"facts": [],
"species": "Diaptomus kenitraensis",
"issues": [],
"occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707",
"countryCode": null,
"basisOfRecord": "PRESERVED_SPECIMEN",
"relations": [],
"classKey": 203,
"catalogNumber": "2010-6707",
"scientificName": "Diaptomus kenitraensis Kiefer, 1926",
"taxonRank": "SPECIES",
"familyKey": 9038,
"kingdom": "Animalia",
"publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e",
"collectionCode": "IU",
"kingdomKey": 1,
"genusKey": 2114554,
"key": 694182141,
"phylumKey": 54,
"genericName": "Diaptomus",
"class": "Maxillopoda",
"crawlId": 116,
"individualCount": 1,
"publishingCountry": "FR",
"identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2010-6707",
"lastCrawled": "2017-08-03T14:05:37.635+0000",
"license": "http://creativecommons.org/licenses/by/4.0/legalcode",
"datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b",
"specificEpithet": "kenitraensis",
"identifiers": [],
"modified": "2015-06-19T19:23:01.000+0000",
"extensions": {},
"genus": "Diaptomus",
"order": "Calanoida"
},
{
"protocol": "DWC_ARCHIVE",
"taxonKey": 4332928,
"family": "Diaptomidae",
"institutionCode": "MNHN",
"lastInterpreted": "2017-05-17T13:19:51.210+0000",
"speciesKey": 4332928,
"gbifID": "440012453",
"identifiedBy": "Dussart B.",
"lastParsed": "2017-05-17T13:19:31.422+0000",
"phylum": "Arthropoda",
"orderKey": 679,
"facts": [],
"species": "Diaptomus kenitraensis",
"issues": [],
"occurrenceID": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537",
"countryCode": null,
"basisOfRecord": "PRESERVED_SPECIMEN",
"relations": [],
"classKey": 203,
"catalogNumber": "2007-1537",
"scientificName": "Diaptomus kenitraensis Kiefer, 1926",
"taxonRank": "SPECIES",
"familyKey": 9038,
"kingdom": "Animalia",
"publishingOrgKey": "2cd829bb-b713-433d-99cf-64bef11e5b3e",
"collectionCode": "IU",
"kingdomKey": 1,
"genusKey": 2114554,
"key": 440012453,
"phylumKey": 54,
"genericName": "Diaptomus",
"class": "Maxillopoda",
"crawlId": 116,
"individualCount": 8,
"publishingCountry": "FR",
"identifier": "http://coldb.mnhn.fr/catalognumber/mnhn/iu/2007-1537",
"lastCrawled": "2017-08-03T14:05:30.146+0000",
"license": "http://creativecommons.org/licenses/by/4.0/legalcode",
"datasetKey": "da6a07ed-9eee-460d-9448-910f542c1a7b",
"specificEpithet": "kenitraensis",
"identifiers": [],
"modified": "2015-06-19T19:23:00.000+0000",
"extensions": {},
"genus": "Diaptomus",
"order": "Calanoida"
}
],
"endOfRecords": false,
"limit": 2,
"offset": 0
}
What is of interest to me is the "results" part.
Using Pandas, I tried this:
df = pd.read_json(json_string)
df.to_csv("output.csv", index=False, sep='\t', encoding="utf-8")
But I got the error below:
File "C:\Python27\lib\site-packages\pandas\io\json.py", line 281, in read_json
date_unit).parse()
File "C:\Python27\lib\site-packages\pandas\io\json.py", line 349, in parse
self._parse_no_numpy()
File "C:\Python27\lib\site-packages\pandas\io\json.py", line 566, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
TypeError: Expected String or Unicode
I also tried most of the more verbose suggestions from here: How can I convert JSON to CSV?, in an attempt to convert the above json directly into CSV (bypassing Pandas) but without success.
Could anyone give me a hint? Thanks in advance for any assistance you can provide.
Best regards,
Upvotes: 0
Views: 8127
Reputation: 862611
You can use json_normalize
:
import json
from pandas.io.json import json_normalize
with open('file.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, 'results')
df.to_csv("output.csv", index=False, sep='\t', encoding="utf-8") #write to csv file
print (df)
basisOfRecord catalogNumber class classKey collectionCode \
0 PRESERVED_SPECIMEN 2010-6707 Maxillopoda 203 IU
1 PRESERVED_SPECIMEN 2007-1537 Maxillopoda 203 IU
countryCode crawlId datasetKey extensions facts \
0 None 116 da6a07ed-9eee-460d-9448-910f542c1a7b {} []
1 None 116 da6a07ed-9eee-460d-9448-910f542c1a7b {} []
... protocol publishingCountry \
0 ... DWC_ARCHIVE FR
1 ... DWC_ARCHIVE FR
publishingOrgKey relations \
0 2cd829bb-b713-433d-99cf-64bef11e5b3e []
1 2cd829bb-b713-433d-99cf-64bef11e5b3e []
scientificName species speciesKey \
0 Diaptomus kenitraensis Kiefer, 1926 Diaptomus kenitraensis 4332928
1 Diaptomus kenitraensis Kiefer, 1926 Diaptomus kenitraensis 4332928
specificEpithet taxonKey taxonRank
0 kenitraensis 4332928 SPECIES
1 kenitraensis 4332928 SPECIES
[2 rows x 45 columns]
Upvotes: 4