Reputation: 7068
Following this documents, I was trying to load JSON Data from S3 to RedShift.
Created JSONPath file & validated (on https://jsonpath.curiousconcept.com/# with expression $.*
)
{
"jsonpaths": [
"$['_record_id']",
"$['_title']",
"$['_server_updated_at']",
"$['_project']",
"$['_assigned_to']",
"$['_updated_by']",
"$['_latitude']",
"$['_longitude']",
"$['date']",
"$['date_received']",
"$['inspection_type']"
]
}
and sample data
[{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}, {
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}]
When I run this COPY command
copy rab.rab_dbo.shipmentreceivinglog2
from 's3://<bucket>/data_report.json'
iam_role 'arn:aws:iam::1234567890:role/RedshiftFileTransfer'
json 's3://<bucket>g/JSONPaths.json';
I get ERROR: Load into table 'shipmentreceivinglog2' failed. Check 'stl_load_errors' system table for details.
When I run select * from stl_load_errors;
I see
Invalid JSONPath format: Member is not an object.
for s3://<bucket>/data_report.json
Whats wrong with my JSONPath File ?
Upvotes: 0
Views: 880
Reputation: 11102
The issue is with your data file. Redshift json input data needs to be a set of json records just smashed together. You have a file that is one json array of objects. An array is one thing. You need to take out the enclosing [] and the commas between elements. Your sample data should look like
{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
{
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
An easy way to get this is to pump the json you have through jq.
jq '.[]' file.json
Upvotes: 1