Reputation: 523
I am working with the google cloud video intelligence API and I am trying to get the results into a pandas dataframe. The output class of the API is repeatedcompositecontainer. So, my thought was to build a dataframe inside the for loop used in the API function.
This is how the API function process the results:
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
With the help of this Stack Overflow article I created an empty list and appended the results to be later converted into a pandas dataframe as below:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
df.append({'Description': category_entity.description})
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
df.append({'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
When I tried only for the last for loop, it gives me a nice structured data frame as below
>>> frame = pd.DataFrame(df)
>>> frame
Confidence End Start
0.704168 599.682416 0.0
0.737053 599.682416 0.0
0.832496 599.682416 0.0
0.427637 599.682416 0.0
0.518693 599.682416 0.0
However when I added the same to logic to the for loop, it gives a distorted dataframe as below
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
NaN technology NaN NaN
0.741133 NaN 599.682416 0.0
NaN keyboard NaN NaN
0.328138 NaN 599.682416 0.0
NaN person NaN NaN
0.436333 NaN 599.682416 0.0
NaN person NaN NaN
I was hoping if there is a way to fix it and get a data frame as below:
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
0.741133 technology 599.682416 0.0
0.328138 keyboard 599.682416 0.0
0.436333 person 599.682416 0.0
What can I try next?
Upvotes: 2
Views: 1592
Reputation: 5460
Change your code like the following:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
label_row = {} # Create a dictionary for the label
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
# Add the description
label_row['Description'] = category_entity.description
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
row_segment_info = {'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
# Add the segment info for this row
label_row.update(row_segment_info)
df.append(label_row) # Now add the row
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
In summary: you were adding lists of rows in each subloop. You want to add the row only once.
Upvotes: 2