Reputation: 1312
I intend to use Google Cloud Speech Transcription for Video Intelligence. The following code only analysis for a partial segment of the video.
video_uri = "gs://cloudmleap/video/next/JaneGoodall.mp4"
language_code = "en-GB"
segment = types.VideoSegment()
segment.start_time_offset.FromSeconds(55)
segment.end_time_offset.FromSeconds(80)
response = transcribe_speech(video_uri, language_code, [segment])
def transcribe_speech(video_uri, language_code, segments=None):
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [enums.Feature.SPEECH_TRANSCRIPTION]
config = types.SpeechTranscriptionConfig(
language_code=language_code,
enable_automatic_punctuation=True,
)
context = types.VideoContext(
segments=segments,
speech_transcription_config=config,
)
print(f'Processing video "{video_uri}"...')
operation = video_client.annotate_video(
input_uri=video_uri,
features=features,
video_context=context,
)
return operation.result()
How can I automatically analyse the whole video rather than defining a particular segment ?
Upvotes: 0
Views: 299
Reputation: 7287
You can follow this tutorial in Video Intelligence google doc. This tutorial shows how to transcribe a whole video. Your input should be stored in a GCS bucket and I see that in your sample code, your video is indeed stored in a GCS bucket so you should not have any issues with this.
Just make sure that you have installed the latest Video Intelligence library.
pip install --upgrade google-cloud-videointelligence
Here is the the code snippet from the Video Intelligence doc for transcribing audio:
"""Transcribe speech from a video stored on GCS."""
from google.cloud import videointelligence
path="gs://your_gcs_bucket/your_video.mp4"
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]
config = videointelligence.SpeechTranscriptionConfig(
language_code="en-US", enable_automatic_punctuation=True
)
video_context = videointelligence.VideoContext(speech_transcription_config=config)
operation = video_client.annotate_video(
request={
"features": features,
"input_uri": path,
"video_context": video_context,
}
)
print("\nProcessing video for speech transcription.")
result = operation.result(timeout=600)
# There is only one annotation_result since only
# one video is processed.
annotation_results = result.annotation_results[0]
for speech_transcription in annotation_results.speech_transcriptions:
# The number of alternatives for each transcription is limited by
# SpeechTranscriptionConfig.max_alternatives.
# Each alternative is a different possible transcription
# and has its own confidence score.
for alternative in speech_transcription.alternatives:
print("Alternative level information:")
print("Transcript: {}".format(alternative.transcript))
print("Confidence: {}\n".format(alternative.confidence))
print("Word level information:")
for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
print(
"\t{}s - {}s: {}".format(
start_time.seconds + start_time.microseconds * 1e-6,
end_time.seconds + end_time.microseconds * 1e-6,
word,
)
)
Upvotes: 1