Andrew
Andrew

Reputation: 195

I am trying to scrape data with Reddit's API. However, I get a Value error when I do this. Why does this happen?

I am trying to scrape recipes from Reddit's API. However, I keep getting an error. If you could help me fix this, then that would be much appreciated.

Here is the code I have used:

#! python3
import praw
import pandas as pd
import datetime as dt
reddit=praw.Reddit(client_id='RpdZdsNcyIE9vg', \
                   client_secret='aVlCaLr5XMfP4BP-1a8-4B2uOo8', \
                   user_agent= 'Food Parser', \
                   username= 'AndrewPlummer2020', \
                   password= 'John3:18')
subreddit=reddit.subreddit('recipes')
top_subreddit=subreddit.top(limit=800)
for submission in subreddit.top(limit=1):
    print(submission.title, submission.id)
topics_dict = {"title":[], \
               "score":[], \
               "id": [], "url": [], \
               "comms_num": [], \
               "created": [], \
               "body": []}
for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

topics_data=pd.DataFrame(topics_dict)
topics_data.to_csv("Dish Recpies.csv", set='\t')

Here is the error I get.

Traceback (most recent call last):
  File "C:/Users/plumm/AppData/Local/Programs/Python/Python37/Reddit_scraper.py", line 27, in <module>
    topics_data=pd.DataFrame(topics_dict)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 411, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
    index = extract_index(arrays)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 368, in extract_index
    raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length

Any help would be much appreciated. Thank you in advance.

Upvotes: 2

Views: 273

Answers (1)

Chris Happy
Chris Happy

Reputation: 7295

Panda is complaining that your arrays are not the same length.

As @Aran-Fey mentioned, it's because your arrays topics_dict[id] and topics_dict[url] are null since you don't append anything to them in the following code.

for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

To fix this, add the following lines:

for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

    # Add url and id
    topics_dict['id'].append(submission.id)
    topics_dict['url'].append(submission.url)

repl.it output when printing out your CSV file

                                                 title  ...  body
0             Garlic Butter Steak and Potatoes Skillet  ...
1    This shoyu ramen broth is our family's favorit...  ...
2    Wasn't sure how to properly thank a stranger f...  ...
3    I'm working on moving all my mothers hand writ...  ...
4                   I made my first ever loaf of bread  ...
..                                                 ...  ...   ...
795            Linguine with Golden Beet and Beef Ragù  ...
796                             Vegan Stone Fruit Tart  ...
797                               Mojito Chicken Tacos  ...
798                                  Mongolian Chicken  ...
799  Stuffed Handmade flat bread ( Paratha )with Eg...  ...

[800 rows x 7 columns]

Last node: Don't forget to change your password and client_secret after you solve your problem :)

Upvotes: 1

Related Questions