Reputation: 195
I am trying to scrape recipes from Reddit's API. However, I keep getting an error. If you could help me fix this, then that would be much appreciated.
Here is the code I have used:
#! python3
import praw
import pandas as pd
import datetime as dt
reddit=praw.Reddit(client_id='RpdZdsNcyIE9vg', \
client_secret='aVlCaLr5XMfP4BP-1a8-4B2uOo8', \
user_agent= 'Food Parser', \
username= 'AndrewPlummer2020', \
password= 'John3:18')
subreddit=reddit.subreddit('recipes')
top_subreddit=subreddit.top(limit=800)
for submission in subreddit.top(limit=1):
print(submission.title, submission.id)
topics_dict = {"title":[], \
"score":[], \
"id": [], "url": [], \
"comms_num": [], \
"created": [], \
"body": []}
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
topics_data=pd.DataFrame(topics_dict)
topics_data.to_csv("Dish Recpies.csv", set='\t')
Here is the error I get.
Traceback (most recent call last):
File "C:/Users/plumm/AppData/Local/Programs/Python/Python37/Reddit_scraper.py", line 27, in <module>
topics_data=pd.DataFrame(topics_dict)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 411, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
index = extract_index(arrays)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 368, in extract_index
raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length
Any help would be much appreciated. Thank you in advance.
Upvotes: 2
Views: 273
Reputation: 7295
Panda is complaining that your arrays are not the same length.
As @Aran-Fey mentioned, it's because your arrays topics_dict[id]
and topics_dict[url]
are null since you don't append anything to them in the following code.
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
To fix this, add the following lines:
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
# Add url and id
topics_dict['id'].append(submission.id)
topics_dict['url'].append(submission.url)
repl.it output when printing out your CSV file
title ... body
0 Garlic Butter Steak and Potatoes Skillet ...
1 This shoyu ramen broth is our family's favorit... ...
2 Wasn't sure how to properly thank a stranger f... ...
3 I'm working on moving all my mothers hand writ... ...
4 I made my first ever loaf of bread ...
.. ... ... ...
795 Linguine with Golden Beet and Beef Ragù ...
796 Vegan Stone Fruit Tart ...
797 Mojito Chicken Tacos ...
798 Mongolian Chicken ...
799 Stuffed Handmade flat bread ( Paratha )with Eg... ...
[800 rows x 7 columns]
Last node: Don't forget to change your password and client_secret after you solve your problem :)
Upvotes: 1