Reputation: 209
I am trying to query data from json data in s3-select.
{
person = [
{
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
{
"Id": 2,
"Name": "Ben Mostafa",
"Address": "Las Vegas",
"Car": "Mustang"
}
{
"Id": 3,
"Name": "Rohan Wood",
"Address": "Wooddon",
"Car": "VW"
}
]
}
QUERY = "select * from S3Object s"
QUERY = "select s.person from S3Object s"
QUERY = "select s.person[0] from S3Object s"
QUERY = "select s.person[0].Name from S3Object s"
All these queries work fine and returns the respective object as desired but when i am trying to search data on name/Car, it doesn't work.
QUERY = "select * from S3Object s where s.person.Name = \"Anshu\" "
error: com.amazonaws.services.s3.model.AmazonS3Exception: The column index at line 1, column 32 is invalid.
There is not much related content available on s3-select online. Wondering whether we can query on field name or not! There are no examples of select query with where clause for s3-select given in the documentation
Upvotes: 6
Views: 21017
Reputation: 315
You can't do it that way. You need to "flatten" your JSON somewhat so it looks like this:
{
person: {
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
}
{
person: {
"Id": 2,
"Name": "Ben Mostafa",
"Address": "Las Vegas",
"Car": "Mustang"
}
}
{
person:{
"Id": 3,
"Name": "Rohan Wood",
"Address": "Wooddon",
"Car": "VW"
}
}
The query below will work as expcted then
select * from s3object s where s.person.name= 'Anshu'
Upvotes: 1
Reputation: 107
After reading the AWS doc, I find following SQL works fine.
select * from S3Object[*].person[*] as p where p.Name='Anshu'
This SQL will give you all the person whose name is 'Anshu', like:
{
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
When you see [*]
, it means a json array.
Amazon S3 Select always treats a JSON document as an array of root-level values, so we use S3Object[*]
in the SQL. And person
value is a array, so we use person[*]
in the SQL.
Upvotes: 5
Reputation: 1652
I can't find this in any of AWS documentations, but I was just playing around and discovered a working syntax:
QUERY = "select * from S3Object s where 'Anshu' in s.person[*].Name"
Based on some deductions:
Proof with Python and Boto3:
import boto3
S3_BUCKET = 'your-bucket-name'
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key='your-file-name.json',
ExpressionType='SQL',
Expression="select * from s3object s where 'Anshu' in s.person[*].Name",
InputSerialization={'JSON': {"Type": "Lines"}},
OutputSerialization={'JSON': {}}
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
Weird, I know. Remember to set [default] credentials in ~/.aws/credentials file.
Upvotes: 12