Reputation: 37
I'm trying to produce a JSON file from an SRT file. I am using Python to do the task. I have below code, which when writing the output on terminal is showing correct output, but when I am producing one JSON file out of it is showing /n after every line. I don't know why that's happening. Can anyone help over here? If not even producing, then I just want to fetch variables from that SRT file. Like timing or sentence. Let's just suppose there is startTime and endTime for every sentence in SRT. I want to fetch the startTime, can we do it even without producing the output? I am just out of clues here. Any help is appreciated.
Here is the code..
import sys
import re
import json
regex = r'(?:\d+)\s(\d+:\d+:\d+,\d+) --> (\d+:\d+:\d+,\d+)\s+(.+?)(?:\n\n|$)'
offset_seconds = lambda ts: sum(howmany * sec for howmany, sec in zip(map(int, ts.replace(',', ':').split(':')), [60 * 60, 60, 1, 1e-3]))
transcript = [dict(startTime = offset_seconds(startTime), endTime = offset_seconds(endTime), ref = ' '.join(ref.split())) for startTime, endTime, ref in re.findall(regex, open('My_SRT_File.srt').read(), re.DOTALL)]
a=json.dumps(transcript, ensure_ascii = False, indent = 2)
with open('data.json','w', encoding='utf-8') as f:
json.dump(a,f, ensure_ascii=False, indent=4)
Output from the produced json file:
"[\n {\n "startTime": 30.0,\n "endTime": 36.0,\n "ref": "Provided by YTS.LT"\n },\n {\n "startTime": 36.5,\n "endTime": 42.0,\n "ref": "Find the official YIFY movies site at https://YTS.LT"\n },\n {\n "startTime": 1830.0,\n "endTime": 1836.0,\n "ref": "Downloaded from YTS.LT"\n },\n {\n "startTime": 3630.0,\n "endTime": 3636.0,\n "ref": "Download more movies for free from YTS.LT"\n }\n]"
You will see it is not in correct JSON format and \n's everywhere.
Upvotes: 1
Views: 1371
Reputation: 102
You are converting a Python object to a string with json.dumps
and then writing the string out...but json.dump
can write your object out directly.
The reason you have the \n and all, is because the json library is treating your string object as a JSON object to write out. You should pass the JSON object (dictionary or list) directly to json.dump
.
import sys
import re
import json
regex = r'(?:\d+)\s(\d+:\d+:\d+,\d+) --> (\d+:\d+:\d+,\d+)\s+(.+?)(?:\n\n|$)'
offset_seconds = lambda ts: sum(howmany * sec for howmany, sec in zip(map(int, ts.replace(',', ':').split(':')), [60 * 60, 60, 1, 1e-3]))
transcript = [dict(startTime = offset_seconds(startTime), endTime = offset_seconds(endTime), ref = ' '.join(ref.split())) for startTime, endTime, ref in re.findall(regex, open('My_SRT_File.srt').read(), re.DOTALL)]
with open('data.json','w', encoding='utf-8') as f:
json.dump(transcript ,f)
Upvotes: 2