Reputation: 217
I have a json file with the following contents:
"1034280": {
"transcript": ["1040560",
"Ok, so what Akanksha is saying is that..."],
"interaction": "Student Question"
},
"1041600": {
"transcript": ["1044860",
"this is also not correct because it will take some time."],
"explain": "Describing&Interpreting"
},
"1046800": {
"transcript": ["1050620",
"So, what you have to do is, what is the closest to the answer?"],
"question": "FocusingInformation"
},
I want to extract the transcript sentences and concatenate them.
For ex. I want the output as:
"Ok, so what Akanksha is saying is that..." "this is also not correct because it will take some time." "So, what you have to do is, what is the closest to the answer?"
Upvotes: 0
Views: 190
Reputation: 58430
This might work for you (GNU sed):
sed '/{/,+2{//,+1d;s/^\s*\|],\s*$//g;H;};$!d;x;s/\n//;y/\n/ /' file
Upvotes: 1
Reputation: 2514
With the caveats
When the input data is in a file called data
:
awk -F"]," 'BEGIN { ORS="" } /"transcript":/ {p=1} NF==2 && p=1 { sub( /^[[:space:]]*"/, (++i==1?"":" ")"\"", $1 ); print $1; p=0 } END { print "\n" }' data
outputs:
"Ok, so what Akanksha is saying is that..." "this is also not correct because it will take some time." "So, what you have to do is, what is the closest to the answer?"
Upvotes: 1