Reputation: 71
I am currently working on a project and I need to collect all the comments of few specific youtube videos.
I am able to get at max 100 comments using commentThreads().list function (More here). Is there any way to get all the comments ?
I am using below function which is provided by Google YouTube Data API developer guide.
def get_comment_threads(youtube, video_id):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText"
).execute()
for item in results["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
print "Comment by %s: %s" % (author, text)
return results["items"]
Upvotes: 4
Views: 5941
Reputation: 1
@Anthony Camarillo you are correct, exception handling is necessary in this case. Secondly I'm adding a bit of a correction to @minhaj 's answer as it keeps calling the same comment page of that video and thus we end up in an infinite while loop. The key is to call the get_comment_threads() function with the nextPageToken parameter. I'm using pandas to store the data in a DataFrame.
Here is the code that worked for me:
import pandas as pd
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "Your_API_KEY"
video_id = "Your_Video_id"
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
comments = []
authors = []
def load_comments(match):
for item in match["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
comments.append(text)
authors.append(author)
print("Comment by {}: {}".format(author, text))
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply["snippet"]["textDisplay"]
print("\n\tReply by {}: {}".format(rauthor, rtext), "\n")
def get_comment_threads(youtube, video_id, nextPageToken):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText",
pageToken = nextPageToken
).execute()
return results
match = get_comment_threads(youtube, video_id, '')
next_page_token = match["nextPageToken"]
load_comments(match)
try:
while next_page_token:
match = get_comment_threads(youtube, video_id, next_page_token)
next_page_token = match["nextPageToken"]
load_comments(match)
except:
data = pd.DataFrame(comments, index = authors,columns=["Comments"])
print(data)
Upvotes: 0
Reputation: 108
As said in the comments above, you can simply use next_page_token
and call a while loop until you stop getting the next page token. But beware that some videos have really large number of comments and this will take a long time to load.
Also, I am writing to extend your above mentioned code.
And I also copied some parts of this code from some Github repository which I do not remember now.
Update the youtube
and video_id
variables as you used them in your get_comment_threads
function previously.
def load_comments(match):
for item in match["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
print("Comment by {}: {}".format(author, text))
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply["snippet"]["textDisplay"]
print("\n\tReply by {}: {}".format(rauthor, rtext), "\n")
def get_comment_threads(youtube, video_id):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText"
).execute()
return results
video_id = ""
youtube = ""
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
while next_page_token:
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
Upvotes: 3
Reputation: 1
To add to @minhaj's answer,
The while loop will run until the last commentThreads.list() response, however the last response won't have a nextPageToken
key and will throw a key error.
A simple try, except worked to fix this so:
try:
while next_page_token:
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
except KeyError:
match = get_comment_thread(youtube, video_id)
load_comments(match)
Upvotes: 0