Ronak Bhuptani
Ronak Bhuptani

Reputation: 71

How to get all comments (more than 100) of a video using YouTube Data API V3?

I am currently working on a project and I need to collect all the comments of few specific youtube videos.
I am able to get at max 100 comments using commentThreads().list function (More here). Is there any way to get all the comments ?

I am using below function which is provided by Google YouTube Data API developer guide.

def get_comment_threads(youtube, video_id):
  results = youtube.commentThreads().list(
    part="snippet",
    maxResults=100,
    videoId=video_id,
    textFormat="plainText"
  ).execute()

  for item in results["items"]:
    comment = item["snippet"]["topLevelComment"]
    author = comment["snippet"]["authorDisplayName"]
    text = comment["snippet"]["textDisplay"]
    print "Comment by %s: %s" % (author, text)

  return results["items"]

Upvotes: 4

Views: 5941

Answers (3)

Parth Phalke
Parth Phalke

Reputation: 1

@Anthony Camarillo you are correct, exception handling is necessary in this case. Secondly I'm adding a bit of a correction to @minhaj 's answer as it keeps calling the same comment page of that video and thus we end up in an infinite while loop. The key is to call the get_comment_threads() function with the nextPageToken parameter. I'm using pandas to store the data in a DataFrame.

Here is the code that worked for me:

import pandas as pd
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "Your_API_KEY"
video_id = "Your_Video_id"
youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)
comments = []
authors = []
def load_comments(match):
    for item in match["items"]:
        comment = item["snippet"]["topLevelComment"]
        author = comment["snippet"]["authorDisplayName"]
        text = comment["snippet"]["textDisplay"]
        comments.append(text)
        authors.append(author)
    
        print("Comment by {}: {}".format(author, text))
        if 'replies' in item.keys():
            for reply in item['replies']['comments']:
                rauthor = reply['snippet']['authorDisplayName']
                rtext = reply["snippet"]["textDisplay"]
            print("\n\tReply by {}: {}".format(rauthor, rtext), "\n")

def get_comment_threads(youtube, video_id, nextPageToken):
    results = youtube.commentThreads().list(
        part="snippet",
        maxResults=100,
        videoId=video_id,
        textFormat="plainText",
        pageToken = nextPageToken
    ).execute()
    return results



match = get_comment_threads(youtube, video_id, '')
next_page_token = match["nextPageToken"]
load_comments(match)

try:
    while next_page_token:
        match = get_comment_threads(youtube, video_id, next_page_token)
        next_page_token = match["nextPageToken"]
        load_comments(match)
except:
    data = pd.DataFrame(comments, index = authors,columns=["Comments"])
    print(data)

Upvotes: 0

minhaj
minhaj

Reputation: 108

As said in the comments above, you can simply use next_page_token and call a while loop until you stop getting the next page token. But beware that some videos have really large number of comments and this will take a long time to load.

Also, I am writing to extend your above mentioned code.

And I also copied some parts of this code from some Github repository which I do not remember now.

Update the youtube and video_id variables as you used them in your get_comment_threads function previously.

def load_comments(match):
    for item in match["items"]:
        comment = item["snippet"]["topLevelComment"]
        author = comment["snippet"]["authorDisplayName"]
        text = comment["snippet"]["textDisplay"]
        print("Comment by {}: {}".format(author, text))
        if 'replies' in item.keys():
            for reply in item['replies']['comments']:
                rauthor = reply['snippet']['authorDisplayName']
                rtext = reply["snippet"]["textDisplay"]
            print("\n\tReply by {}: {}".format(rauthor, rtext), "\n")

def get_comment_threads(youtube, video_id):
    results = youtube.commentThreads().list(
        part="snippet",
        maxResults=100,
        videoId=video_id,
        textFormat="plainText"
    ).execute()
    return results

video_id = ""
youtube = ""
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)

while next_page_token:
    match = get_comment_thread(youtube, video_id)
    next_page_token = match["nextPageToken"]
    load_comments(match)

Upvotes: 3

Anthony Camarillo
Anthony Camarillo

Reputation: 1

To add to @minhaj's answer,

The while loop will run until the last commentThreads.list() response, however the last response won't have a nextPageToken key and will throw a key error.

A simple try, except worked to fix this so:

try:
  while next_page_token:
      match = get_comment_thread(youtube, video_id)
      next_page_token = match["nextPageToken"]
      load_comments(match)
except KeyError:
      match = get_comment_thread(youtube, video_id)
      load_comments(match)

Upvotes: 0

Related Questions