Reputation: 374
I have a json file 'OpenEnded_mscoco_val2014.json'.The json file contains 121,512 questions.
Here is some sample :
"questions": [
{
"question": "What is the table made of?",
"image_id": 350623,
"question_id": 3506232
},
{
"question": "Is the food napping on the table?",
"image_id": 350623,
"question_id": 3506230
},
{
"question": "What has been upcycled to make lights?",
"image_id": 350623,
"question_id": 3506231
},
{
"question": "Is this an Spanish town?",
"image_id": 8647,
"question_id": 86472
}
]
I used jq -r '.questions | [map(.question), map(.image_id), map(.question_id)] | @csv' OpenEnded_mscoco_val2014_questions.json >> temp.csv
to convert json into csv.
But here output in csv is question followed by image_id which is what above code does.
The expected output is :
"What is table made of",350623,3506232
"Is the food napping on the table?",350623,3506230
Also is it possible to filter only results havingimage_id <= 10000
and to group questions having same image_id
? e.g. 1,2,3 result of json can be combined to have 3 questions, 1 image_id, 3 question_id.
EDIT : The first problem is solved by possible duplicate question
.I would like to know if is it possible to invoke comparison operator on command line in jq for converting json file. In this case get all fields from json if image_id <= 10000
only.
Upvotes: 0
Views: 253
Reputation: 14655
With the -r
option, the following filter
.questions[] | [ .[] ] | @csv
produces
"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472
To filter the data, use select. E.g. with the -r
option the following filter
.questions[] | select(.image_id <= 10000) | [ .[] ] | @csv
produces the subset
"Is this an Spanish town?",8647,86472
To group the data use group_by. The following filter
.questions
| group_by(.image_id)[]
| [ .[] | [ .[] ] | @csv ]
produces grouped data
[
"\"Is this an Spanish town?\",8647,86472"
]
[
"\"What is the table made of?\",350623,3506232",
"\"Is the food napping on the table?\",350623,3506230",
"\"What has been upcycled to make lights?\",350623,3506231"
]
This isn't very useful in this form and is probably not exactly what you want but it demonstrates the basic approach.
Upvotes: 0
Reputation: 116780
1) Given your input (suitably elaborated to make it valid JSON), the following query generates the CSV output as shown:
$ jq -r '.questions[] | [.question, .image_id, .question_id] | @csv'
"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472
The key thing to remember here is that @csv requires a flat array, but as with all jq filters, you can feed it a stream.
2) To filter using the criterion .image_id <= 10000
, just interpose the appropriate select/1
filter:
.questions[]
| select(.image_id <= 10000)
| [.question, .image_id, .question_id]
| @csv
3) To sort by image_id, use sort_by(.image_id)
.questions
| sort_by(.image_id)
|.[]
| [.question, .image_id, .question_id]
| @csv
4) To group by .image_id
you would pipe the output of the following pipeline into your own pipeline:
.questions | group_by(.image_id)
You will, however, have to decide exactly how you want to combine the grouped objects.
Upvotes: 1