user1862399
user1862399

Reputation: 217

Extract sentences from a json file using bash

I have a json file with the following contents:

"1034280": {
    "transcript": ["1040560",
    "Ok, so what Akanksha is saying is that..."],
    "interaction": "Student Question"
},
"1041600": {
    "transcript": ["1044860",
    "this is also not correct because it will take some time."],
    "explain": "Describing&Interpreting"
},
"1046800": {
    "transcript": ["1050620",
    "So, what you have to do is, what is the closest to the answer?"],
    "question": "FocusingInformation"
},

I want to extract the transcript sentences and concatenate them.

For ex. I want the output as:

"Ok, so what Akanksha is saying is that..." "this is also not correct because it will take some time." "So, what you have to do is, what is the closest to the answer?"

Upvotes: 0

Views: 190

Answers (2)

potong
potong

Reputation: 58430

This might work for you (GNU sed):

sed '/{/,+2{//,+1d;s/^\s*\|],\s*$//g;H;};$!d;x;s/\n//;y/\n/ /' file

Upvotes: 1

n0741337
n0741337

Reputation: 2514

With the caveats

  • You should really use a JSON parsing library as indicated in the comments
  • This will likely only work if your data exactly matches the question
  • I'll leave deciphering the awk up to you as you didn't specify what you've tried

When the input data is in a file called data:

awk -F"]," 'BEGIN { ORS="" } /"transcript":/ {p=1} NF==2 && p=1 { sub( /^[[:space:]]*"/, (++i==1?"":" ")"\"", $1 ); print $1; p=0 } END { print "\n" }' data

outputs:

"Ok, so what Akanksha is saying is that..." "this is also not correct because it will take some time." "So, what you have to do is, what is the closest to the answer?"

Upvotes: 1

Related Questions