SmilingSwordman
SmilingSwordman

Reputation: 67

How to create a DataFrame with Reddit API loop and manage the list

I'm very new to Reddit API (PRAW/PSAW), Python, as well as programming in general. What I'm trying to do is get top submissions from certain subreddits within 6 months, then convert the list into a DataFrame and to CSV file later.

I want to:

  1. Get the length of the list
  2. Sort by date(epoch)
  3. Make a data frame out of this

What I tried so far:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
        len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
        sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable. 
                                       # I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.

So in brief,

I suppose making a data frame out of this can as well solve the first 2 problems, although I think being able to do that using the codes will be helpful when evaluating the list!

Upvotes: 0

Views: 576

Answers (1)

Tot Zam
Tot Zam

Reputation: 8756

To answer the 3 parts of your question:

  1. To get the length of a list, you need to pass the list you want to evaluate to the len() method, so if you want to let's say find the length of list_submission, you would instead do len(list_submission). Right now you are basically trying to get the length of nothingness, so that is why you are seeing zeros.
  2. If the submission matches the requirements, you can append it to the list of submissions with list_submission.append(submission). Then after the for loop is complete, you can used sorted() to sort the entire list. You need to pass in the whole list plus the key you want to sort on, so it would look like sorted(list_submission, key=lambda submission: submission.created_utc). The reason you are getting an error is because you are passing in the wrong parameters.
  3. Your method for converting the list into a DataFrame should then work. You can use columns = ['created_utc', 'title', 'score', 'id'] to set the column names.

Final code will look something like the following:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id)
        list_submission.append(submission)
        print(len(list_submission))

sorted(list_submission, key=lambda submission: submission.created_utc)  
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])

Upvotes: 1

Related Questions